IBM HR Analytics Employee Attrition
¶

Autores: Tamara, Jorge Luis Arias Otero, Diana, Andres, Aristides.🖋️

Análisis de datos sobre el conjunto de datos de rotación de los trabajadores
¶

Objetivo:


El objetivo principal es comprender los factores que contribuyen a la rotación de empleados en una empresa y desarrollar un modelo predictivo para predecir la Attrition.¶

  1. Importación de librerías y carga del DataFrame: En el archivo main.ipynb, se importan las librerías necesarias y se carga el conjunto de datos desde el archivo CSV.
In [ ]:
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import numpy as np # linear algebra
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Import statements required for Plotly 
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls


from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (accuracy_score, log_loss, classification_report)
from imblearn.over_sampling import SMOTE
import xgboost

# Import and suppress warnings
import warnings
warnings.filterwarnings('ignore')
  1. Análisis exploratorio de datos:
Realizar una exploración inicial del conjunto de datos para comprender su estructura, variables y distribución.¶

Al ejecutar este código, se cargara el archivo CSV en el DataFrame df y obtendremos una vista previa de las primeras filas, información sobre las columnas y tipos de datos, así como estadísticas descriptivas de las variables numéricas.

Este es solo el primer paso del análisis exploratorio de datos. A medida que avancemos, exploraremos más a fondo las variables, realizaremos visualizaciones y extraeremos información valiosa para el análisis

In [ ]:
df = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')

Realizar exploración inicial del conjunto de datos¶

In [ ]:
df.head().style.set_properties(**{'background-color': '#E9F6E2','color': 'black','border-color': '#8b8c8c'})
Out[ ]:
  Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel JobRole JobSatisfaction MaritalStatus MonthlyIncome MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 2 Female 94 3 2 Sales Executive 4 Single 5993 19479 8 Y Yes 11 3 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 3 Male 61 2 2 Research Scientist 2 Married 5130 24907 1 Y No 23 4 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 4 Male 92 2 1 Laboratory Technician 3 Single 2090 2396 6 Y Yes 15 3 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 4 Female 56 3 1 Research Scientist 3 Married 2909 23159 1 Y Yes 11 3 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 1 Male 40 3 1 Laboratory Technician 2 Married 3468 16632 9 Y No 12 3 4 80 1 6 3 3 2 2 2 2

Descripción de las columnas del Dataset
¶

STT Attribute Name Meaning
1 Age Employee's age
2 Gender Employee's Gender
3 BusinessTravel Frequency of employees' business trips
4 DailyRate Daily salary rate for employees
5 Department Office of employees
6 DistanceFromHome Distance from home in miles to work
7 Education Level of education achieved by staff
8 EducationField Employee's field of study
9 EmployeeCount Total number of employees in the organization
10 EmployeeNumber A unique identifier for each employee record
11 EnvironmentSatisfaction Employee satisfaction with their working environment
12 HourlyRate Hourly rate for employees
13 JobInvolvement Level of involvement required for the employee's job
14 JobLevel Employee's level of work
15 JobRole The role of employees in the organization
16 JobSatisfaction Employee satisfaction with their work
17 MaritalStatus Employee's marital status
18 MonthlyIncome Employee's monthly income
19 MonthlyRate Monthly salary rate for employees
20 NumCompaniesWorked Number of companies the employee worked for
21 Over18 Whether the employee is over 18 years old
22 OverTime Do employees work overtime
23 PercentSalaryHike Salary increase rate for employees
24 PerformanceRating The performance rating of the employee
25 RelationshipSatisfaction Employee satisfaction with their relationships
26 StandardHours Standard working hours for employees
27 StockOptionLevel Employee stock option level
28 TotalWorkingYears Total number of years the employee has worked
29 TrainingTimesLastYear Number of times employees were taken to training in the last year
30 WorkLifeBalance Employees' perception of their work-life balance
31 YearsAtCompany Number of years employees have been with the company
32 YearsInCurrentRole Number of years the employee has been in their current role
33 YearsSinceLastPromotion Number of years since employee's last promotion
34 YearsWithCurrManager Number of years an employee has been with their current manager
35 Attrition Does the employee leave the organization

1. Calculando la dimensión del Dataset.¶

In [ ]:
df.shape
Out[ ]:
(1470, 35)

2. Generación de Información Básica de Atributos.¶

In [ ]:
df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Columns: 35 entries, Age to YearsWithCurrManager
dtypes: int64(26), object(9)
memory usage: 402.1+ KB

💬 Conclusion:

  1. 26 Variables numéricos en el conjunto de datos.
  2. 9 Variables Categóricos.

Muestra aleatoria del conjunto de datos con solo características numéricas.¶

In [ ]:
df.select_dtypes(np.number).sample(5).style.set_properties(**{'background-color': '#E9F6E2',
                                                              'color': 'black','border-color': '#8b8c8c'})
Out[ ]:
  Age DailyRate DistanceFromHome Education EmployeeCount EmployeeNumber EnvironmentSatisfaction HourlyRate JobInvolvement JobLevel JobSatisfaction MonthlyIncome MonthlyRate NumCompaniesWorked PercentSalaryHike PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
333 43 1001 7 3 1 451 3 43 3 3 1 9985 9262 8 16 3 1 80 1 10 1 2 1 0 0 0
644 31 1222 11 4 1 895 4 48 3 1 4 2356 14871 3 19 3 2 80 1 8 2 3 6 4 0 2
870 35 1361 17 4 1 1218 3 94 3 2 1 8966 21026 3 15 3 4 80 3 15 2 3 7 7 1 7
1422 35 1490 11 4 1 2003 4 43 3 1 3 2660 20232 7 11 3 3 80 1 5 3 3 2 2 2 2
1194 47 1225 2 4 1 1676 2 47 4 4 2 15972 21086 6 14 3 3 80 3 29 2 3 3 2 1 2

💬 Conclusion:

  1. Algunas de las características numéricas almacenan categorías etiquetadas con números.
  2. Entonces, para un mejor análisis, reemplazaremos los valores numéricos etiquetados con valores categóricos apropiados.

4. Etiquetado de Categorías en Características Numéricas.¶

  • Nota:

    • Los valores que se usan a continuación para etiquetar las categorías se proporcionan en la descripción del conjunto de datos en kaggle.
In [ ]:
df["Education"] = df["Education"].replace({1:"Below College",2:"College",3:"Bachelor",4:"Master",5:"Doctor"})
In [ ]:
df["EnvironmentSatisfaction"] = df["EnvironmentSatisfaction"].replace({1:"Low",2:"Medium",3:"High",4:"Very High"})
In [ ]:
df["JobInvolvement"] = df["JobInvolvement"].replace({1:"Low",2:"Medium",3:"High",4:"Very High"})
In [ ]:
df["JobLevel"] = df["JobLevel"].replace({1:"Entry Level",2:"Junior Level",3:"Mid Level",4:"Senior Level",
                                         5:"Executive Level"})
In [ ]:
df["JobSatisfaction"] = df["JobSatisfaction"].replace({1:"Low",2:"Medium",3:"High",4:"Very High"})
In [ ]:
df["PerformanceRating"] = df["PerformanceRating"].replace({1:"Low",2:"Good",3:"Excellent",4:"Outstanding"})
In [ ]:
df["RelationshipSatisfaction"] = df["RelationshipSatisfaction"].replace({1:"Low",2:"Medium",3:"High",4:"Very High"})
In [ ]:
df["WorkLifeBalance"] = df["WorkLifeBalance"].replace({1:"Bad",2:"Good",3:"Better",4:"Best"})

Mostrando una muestra aleatoria de conjunto de datos con solo características categóricas.¶

In [ ]:
df.select_dtypes(include="O").sample(5).style.set_properties(**{'background-color': '#E9F6E2',
                                                                'color': 'black','border-color': '#8b8c8c'})
Out[ ]:
  Attrition BusinessTravel Department Education EducationField EnvironmentSatisfaction Gender JobInvolvement JobLevel JobRole JobSatisfaction MaritalStatus Over18 OverTime PerformanceRating RelationshipSatisfaction WorkLifeBalance
906 No Travel_Rarely Research & Development Bachelor Technical Degree High Female Very High Entry Level Research Scientist High Married Y No Excellent Very High Good
518 No Travel_Rarely Sales Master Marketing Very High Female Medium Junior Level Sales Executive Very High Single Y No Outstanding Low Better
1385 No Travel_Rarely Sales Master Medical Very High Male High Mid Level Sales Executive High Divorced Y No Excellent High Good
470 No Travel_Frequently Sales Bachelor Medical Very High Male High Entry Level Sales Representative Very High Married Y No Excellent High Better
1055 No Travel_Frequently Research & Development Bachelor Medical Medium Male High Senior Level Research Director Low Divorced Y No Excellent Very High Good

Comprobar si hay registros duplicados.¶

In [ ]:
have_duplicate_rows = df.duplicated().any()
have_duplicate_rows
Out[ ]:
False

💬 Conclusion:

  1. Dado que el resultado es falso, podemos decir que no hay registros duplicados presentes en el conjunto de datos.

Calcular el número total de valores nulos y el porcentaje de valores nulos.¶

In [ ]:
missing_df = df.isnull().sum().to_frame().rename(columns={0:"Total No. of Missing Values"})
missing_df["% of Missing Values"] = round((missing_df["Total No. of Missing Values"]/len(df))*100,2)
missing_df
Out[ ]:
Total No. of Missing Values % of Missing Values
Age 0 0.0
Attrition 0 0.0
BusinessTravel 0 0.0
DailyRate 0 0.0
Department 0 0.0
DistanceFromHome 0 0.0
Education 0 0.0
EducationField 0 0.0
EmployeeCount 0 0.0
EmployeeNumber 0 0.0
EnvironmentSatisfaction 0 0.0
Gender 0 0.0
HourlyRate 0 0.0
JobInvolvement 0 0.0
JobLevel 0 0.0
JobRole 0 0.0
JobSatisfaction 0 0.0
MaritalStatus 0 0.0
MonthlyIncome 0 0.0
MonthlyRate 0 0.0
NumCompaniesWorked 0 0.0
Over18 0 0.0
OverTime 0 0.0
PercentSalaryHike 0 0.0
PerformanceRating 0 0.0
RelationshipSatisfaction 0 0.0
StandardHours 0 0.0
StockOptionLevel 0 0.0
TotalWorkingYears 0 0.0
TrainingTimesLastYear 0 0.0
WorkLifeBalance 0 0.0
YearsAtCompany 0 0.0
YearsInCurrentRole 0 0.0
YearsSinceLastPromotion 0 0.0
YearsWithCurrManager 0 0.0

Realización de análisis descriptivos de atributos numéricos.¶

💬 Nota:

- Contar: Muestra el número de valores que no faltan en cada una de las columnas.
- Media: Muestra el promedio de los valores en cada una de las columnas.
- Std: Muestra la desviación estándar de los valores en cada una de las columnas.
- Min: Muestra el valor más pequeño en cada una de las columnas.
- Max: Muestra el valor máximo en cada una de las columnas.
- 25%, 50%, 75%: Los valores corresponden a las divisiones de datos de los percentiles 25%, 50% y 75%.
In [ ]:
df.describe().T
Out[ ]:
count mean std min 25% 50% 75% max
Age 1470.0 36.923810 9.135373 18.0 30.00 36.0 43.00 60.0
DailyRate 1470.0 802.485714 403.509100 102.0 465.00 802.0 1157.00 1499.0
DistanceFromHome 1470.0 9.192517 8.106864 1.0 2.00 7.0 14.00 29.0
EmployeeCount 1470.0 1.000000 0.000000 1.0 1.00 1.0 1.00 1.0
EmployeeNumber 1470.0 1024.865306 602.024335 1.0 491.25 1020.5 1555.75 2068.0
HourlyRate 1470.0 65.891156 20.329428 30.0 48.00 66.0 83.75 100.0
MonthlyIncome 1470.0 6502.931293 4707.956783 1009.0 2911.00 4919.0 8379.00 19999.0
MonthlyRate 1470.0 14313.103401 7117.786044 2094.0 8047.00 14235.5 20461.50 26999.0
NumCompaniesWorked 1470.0 2.693197 2.498009 0.0 1.00 2.0 4.00 9.0
PercentSalaryHike 1470.0 15.209524 3.659938 11.0 12.00 14.0 18.00 25.0
StandardHours 1470.0 80.000000 0.000000 80.0 80.00 80.0 80.00 80.0
StockOptionLevel 1470.0 0.793878 0.852077 0.0 0.00 1.0 1.00 3.0
TotalWorkingYears 1470.0 11.279592 7.780782 0.0 6.00 10.0 15.00 40.0
TrainingTimesLastYear 1470.0 2.799320 1.289271 0.0 2.00 3.0 3.00 6.0
YearsAtCompany 1470.0 7.008163 6.126525 0.0 3.00 5.0 9.00 40.0
YearsInCurrentRole 1470.0 4.229252 3.623137 0.0 2.00 3.0 7.00 18.0
YearsSinceLastPromotion 1470.0 2.187755 3.222430 0.0 0.00 1.0 3.00 15.0
YearsWithCurrManager 1470.0 4.123129 3.568136 0.0 2.00 3.0 7.00 17.0

💬 Conclusion:

1. La edad mínima es de 18 años, lo que implica que todos los empleados son adultos. Por lo tanto, no es necesario el atributo Over18 para nuestro análisis.
2. El valor de desviación estándar de EmployeeCount y StandardHours es 0,00, lo que indica que todos los valores presentes en este atributo son iguales.
3. El atributo EmployeeNumber representa un valor único para cada uno de los empleados, que no proporcionará ninguna información significativa.
4. Dado que estas columnas no proporcionará información significativa en nuestro análisis, simplemente podemos descartarlas.

Eliminación de columnas, lo que no implica ningún conocimiento significativo en nuestro análisis.¶

In [ ]:
cols = ["Over18","EmployeeCount","EmployeeNumber","StandardHours"]

df.drop(columns=cols, inplace=True)

Realización de análisis descriptivos sobre las caracteristicas categóricas.¶

💬 Nota:

- Contar: Muestra el número de valores que no faltan en cada una de las columnas.
- Único: Muestra el número de valores únicos presentes en cada una de las columnas.
- Arriba: Muestra qué valor categórico está más presente en cada una de las columnas.
- Freq : Muestra la frecuencia del valor categórico que más aparece en cada una de las columnas.
In [ ]:
df.describe(include="O").T
Out[ ]:
count unique top freq
Attrition 1470 2 No 1233
BusinessTravel 1470 3 Travel_Rarely 1043
Department 1470 3 Research & Development 961
Education 1470 5 Bachelor 572
EducationField 1470 6 Life Sciences 606
EnvironmentSatisfaction 1470 4 High 453
Gender 1470 2 Male 882
JobInvolvement 1470 4 High 868
JobLevel 1470 5 Entry Level 543
JobRole 1470 9 Sales Executive 326
JobSatisfaction 1470 4 Very High 459
MaritalStatus 1470 3 Married 673
OverTime 1470 2 No 1054
PerformanceRating 1470 2 Excellent 1244
RelationshipSatisfaction 1470 4 High 459
WorkLifeBalance 1470 4 Better 893

💬 Conclusion:

1. Todos los atributos categóricos tienen baja cardinalidad (repeticiones).
2. La columna Deserción y tiempo extra está muy sesgada hacia Sin categoría.
3. El atributo Businesstravel está muy sesgado hacia la categoría Travel_Rarely.

Comprobación de valores únicos de columnas categóricas.¶

In [ ]:
cat_cols = df.select_dtypes(include="O").columns

for column in cat_cols:
    print('Unique values of ', column, set(df[column]))
    print("-"*127)
Unique values of  Attrition {'Yes', 'No'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  BusinessTravel {'Travel_Frequently', 'Non-Travel', 'Travel_Rarely'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  Department {'Human Resources', 'Sales', 'Research & Development'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  Education {'Bachelor', 'Doctor', 'Below College', 'Master', 'College'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  EducationField {'Technical Degree', 'Human Resources', 'Life Sciences', 'Marketing', 'Other', 'Medical'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  EnvironmentSatisfaction {'Low', 'Medium', 'High', 'Very High'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  Gender {'Female', 'Male'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  JobInvolvement {'Low', 'Medium', 'High', 'Very High'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  JobLevel {'Entry Level', 'Junior Level', 'Mid Level', 'Executive Level', 'Senior Level'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  JobRole {'Sales Executive', 'Research Director', 'Human Resources', 'Manufacturing Director', 'Manager', 'Laboratory Technician', 'Research Scientist', 'Healthcare Representative', 'Sales Representative'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  JobSatisfaction {'Low', 'Very High', 'High', 'Medium'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  MaritalStatus {'Married', 'Single', 'Divorced'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  OverTime {'Yes', 'No'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  PerformanceRating {'Outstanding', 'Excellent'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  RelationshipSatisfaction {'Low', 'Very High', 'High', 'Medium'}
-------------------------------------------------------------------------------------------------------------------------------
Unique values of  WorkLifeBalance {'Better', 'Bad', 'Best', 'Good'}
-------------------------------------------------------------------------------------------------------------------------------

💬 Conclusion:

1. El conjunto de valores de los atributos categóricos es completo y fácil de entender.
2. Por lo tanto, no necesitamos realizar pasos de preprocesamiento para estos atributos.

Analysis Exploratorio de los Datos.
¶

Visualización de la tasa de deserción de empleados.
¶

In [ ]:
#Visualization to show Employee Attrition in Counts.
plt.figure(figsize=(17,6))
plt.subplot(1,2,1)
attrition_rate = df["Attrition"].value_counts()
sns.barplot(x=attrition_rate.index,y=attrition_rate.values,palette=["brown","orange"])
plt.title("Employee Attrition Counts",fontweight="black",size=20,pad=20)
for i, v in enumerate(attrition_rate.values):
    plt.text(i, v, v,ha="center", fontweight='black', fontsize=14)

#Visualization to show Employee Attrition in Percentage.
plt.subplot(1,2,2)
plt.pie(attrition_rate, labels=["No","Yes"], autopct="%.2f%%", textprops={"fontweight":"black","size":15},
        colors = ["brown","orange"],explode=[0,0.1],startangle=90)
center_circle = plt.Circle((0, 0), 0.3, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)
plt.title("Employee Attrition Rate",fontweight="black",size=20,pad=10)
plt.show()

💬 Conclusion:

1. La tasa de deserción de empleados de esta organización es del 16,12 %.
2. Según los expertos en el campo de los Recursos Humanos, dice que la tasa de deserción del 4% al 6% es normal en la organización.
3. Entonces podemos decir que la tasa de deserción de la organización está en un nivel peligroso.
4. Por lo tanto, la organización debe tomar medidas para reducir la tasa de deserción.

Creamos la columna Fecha y se agreguará al archivo CSV original.
¶

Para Analizar aun mas la Attrition de la empresa y tener mejores resultados¶

Tendencias estacionales Eventos externos: Análisis de cohortes: Predicciones futuras:

In [ ]:
import pandas as pd

# Lee el archivo CSV original
file_path = r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv'
df = pd.read_csv(file_path)

# Obtén el año actual
current_year = 2018

# Calcula la columna de fecha en base a la columna YearsAtCompany
df['Fecha'] = current_year - df['YearsAtCompany']

# Guarda el DataFrame actualizado en el archivo CSV original
df.to_csv(file_path, index=False)
In [ ]:
import pandas as pd

# Lee el archivo CSV actualizado
file_path = r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv'
df = pd.read_csv(file_path)

# Visualizar la columna de fecha
print(df['Fecha'].head())
0    2012
1    2008
2    2018
3    2010
4    2016
Name: Fecha, dtype: int64

Generar el informe de perfilado de datos .
¶

Pandas Profiling¶

In [ ]:
import pandas as pd
from pandas_profiling import ProfileReport
from IPython.display import HTML

# Cargar el conjunto de datos completo
df = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')
cols = ["Over18","EmployeeCount","EmployeeNumber","StandardHours"]

df.drop(columns=cols, inplace=True)

# Generar el informe de perfilado de datos
profile = ProfileReport(df)

# Mostrar el informe en el notebook
HTML(profile.to_html())
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[ ]:
Pandas Profiling Report
Pandas Profiling Report
  • Overview
  • Variables
  • Interactions
  • Correlations
  • Missing values
  • Sample

Overview

  • Overview
  • Alerts 22
  • Reproduction

Dataset statistics

Number of variables32
Number of observations1470
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory367.6 KiB
Average record size in memory256.1 B

Variable types

Numeric15
Boolean2
Categorical15

Alerts

Age is highly overall correlated with TotalWorkingYearsHigh correlation
MonthlyIncome is highly overall correlated with TotalWorkingYears and 1 other fieldsHigh correlation
PercentSalaryHike is highly overall correlated with PerformanceRatingHigh correlation
TotalWorkingYears is highly overall correlated with Age and 4 other fieldsHigh correlation
YearsAtCompany is highly overall correlated with TotalWorkingYears and 4 other fieldsHigh correlation
YearsInCurrentRole is highly overall correlated with YearsAtCompany and 3 other fieldsHigh correlation
YearsSinceLastPromotion is highly overall correlated with YearsAtCompany and 2 other fieldsHigh correlation
YearsWithCurrManager is highly overall correlated with YearsAtCompany and 2 other fieldsHigh correlation
Fecha is highly overall correlated with TotalWorkingYears and 4 other fieldsHigh correlation
Department is highly overall correlated with EducationField and 1 other fieldsHigh correlation
EducationField is highly overall correlated with DepartmentHigh correlation
JobLevel is highly overall correlated with MonthlyIncome and 2 other fieldsHigh correlation
JobRole is highly overall correlated with Department and 1 other fieldsHigh correlation
MaritalStatus is highly overall correlated with StockOptionLevelHigh correlation
PerformanceRating is highly overall correlated with PercentSalaryHikeHigh correlation
StockOptionLevel is highly overall correlated with MaritalStatusHigh correlation
NumCompaniesWorked has 197 (13.4%) zerosZeros
TrainingTimesLastYear has 54 (3.7%) zerosZeros
YearsAtCompany has 44 (3.0%) zerosZeros
YearsInCurrentRole has 244 (16.6%) zerosZeros
YearsSinceLastPromotion has 581 (39.5%) zerosZeros
YearsWithCurrManager has 263 (17.9%) zerosZeros

Reproduction

Analysis started2023-07-13 23:11:34.002046
Analysis finished2023-07-13 23:12:36.899185
Duration1 minute and 2.9 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

Age
Real number (ℝ)

Distinct43
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.92381
Minimum18
Maximum60
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:37.036817image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum18
5-th percentile24
Q130
median36
Q343
95-th percentile54
Maximum60
Range42
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.1353735
Coefficient of variation (CV)0.24741146
Kurtosis-0.40414514
Mean36.92381
Median Absolute Deviation (MAD)6
Skewness0.4132863
Sum54278
Variance83.455049
MonotonicityNot monotonic
2023-07-14T01:12:37.223617image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
35 78
 
5.3%
34 77
 
5.2%
36 69
 
4.7%
31 69
 
4.7%
29 68
 
4.6%
32 61
 
4.1%
30 60
 
4.1%
33 58
 
3.9%
38 58
 
3.9%
40 57
 
3.9%
Other values (33) 815
55.4%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
18 8
 
0.5%
19 9
 
0.6%
20 11
 
0.7%
21 13
 
0.9%
22 16
 
1.1%
23 14
 
1.0%
24 26
1.8%
25 26
1.8%
26 39
2.7%
27 48
3.3%
ValueCountFrequency (%)
60 5
 
0.3%
59 10
0.7%
58 14
1.0%
57 4
 
0.3%
56 14
1.0%
55 22
1.5%
54 18
1.2%
53 19
1.3%
52 18
1.2%
51 19
1.3%

Attrition
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
False
1233 
True
237 
  • Common Values (Table)
  • Common Values (Plot)
ValueCountFrequency (%)
False 1233
83.9%
True 237
 
16.1%
2023-07-14T01:12:37.406940image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

BusinessTravel
Categorical

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
Travel_Rarely
1043 
Travel_Frequently
277 
Non-Travel
150 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length17
Median length13
Mean length13.447619
Min length10

Characters and Unicode

Total characters19768
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTravel_Rarely
2nd rowTravel_Frequently
3rd rowTravel_Rarely
4th rowTravel_Frequently
5th rowTravel_Rarely

Common Values

ValueCountFrequency (%)
Travel_Rarely 1043
71.0%
Travel_Frequently 277
 
18.8%
Non-Travel 150
 
10.2%

Length

2023-07-14T01:12:37.564765image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:37.753947image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
travel_rarely 1043
71.0%
travel_frequently 277
 
18.8%
non-travel 150
 
10.2%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
e 3067
15.5%
r 2790
14.1%
l 2790
14.1%
a 2513
12.7%
T 1470
7.4%
v 1470
7.4%
y 1320
6.7%
_ 1320
6.7%
R 1043
 
5.3%
n 427
 
2.2%
Other values (7) 1558
7.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15358
77.7%
Uppercase Letter 2940
 
14.9%
Connector Punctuation 1320
 
6.7%
Dash Punctuation 150
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3067
20.0%
r 2790
18.2%
l 2790
18.2%
a 2513
16.4%
v 1470
9.6%
y 1320
8.6%
n 427
 
2.8%
q 277
 
1.8%
u 277
 
1.8%
t 277
 
1.8%
Uppercase Letter
ValueCountFrequency (%)
T 1470
50.0%
R 1043
35.5%
F 277
 
9.4%
N 150
 
5.1%
Connector Punctuation
ValueCountFrequency (%)
_ 1320
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 150
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 18298
92.6%
Common 1470
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 3067
16.8%
r 2790
15.2%
l 2790
15.2%
a 2513
13.7%
T 1470
8.0%
v 1470
8.0%
y 1320
7.2%
R 1043
 
5.7%
n 427
 
2.3%
F 277
 
1.5%
Other values (5) 1131
 
6.2%
Common
ValueCountFrequency (%)
_ 1320
89.8%
- 150
 
10.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 3067
15.5%
r 2790
14.1%
l 2790
14.1%
a 2513
12.7%
T 1470
7.4%
v 1470
7.4%
y 1320
6.7%
_ 1320
6.7%
R 1043
 
5.3%
n 427
 
2.2%
Other values (7) 1558
7.9%

DailyRate
Real number (ℝ)

Distinct886
Distinct (%)60.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean802.48571
Minimum102
Maximum1499
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:37.928797image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum102
5-th percentile165.35
Q1465
median802
Q31157
95-th percentile1424.1
Maximum1499
Range1397
Interquartile range (IQR)692

Descriptive statistics

Standard deviation403.5091
Coefficient of variation (CV)0.50282403
Kurtosis-1.2038228
Mean802.48571
Median Absolute Deviation (MAD)344
Skewness-0.0035185684
Sum1179654
Variance162819.59
MonotonicityNot monotonic
2023-07-14T01:12:38.123276image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
691 6
 
0.4%
408 5
 
0.3%
530 5
 
0.3%
1329 5
 
0.3%
1082 5
 
0.3%
329 5
 
0.3%
829 4
 
0.3%
1469 4
 
0.3%
267 4
 
0.3%
217 4
 
0.3%
Other values (876) 1423
96.8%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
102 1
 
0.1%
103 1
 
0.1%
104 1
 
0.1%
105 1
 
0.1%
106 1
 
0.1%
107 1
 
0.1%
109 1
 
0.1%
111 3
0.2%
115 1
 
0.1%
116 2
0.1%
ValueCountFrequency (%)
1499 1
 
0.1%
1498 1
 
0.1%
1496 2
0.1%
1495 3
0.2%
1492 1
 
0.1%
1490 4
0.3%
1488 1
 
0.1%
1485 3
0.2%
1482 1
 
0.1%
1480 2
0.1%

Department
Categorical

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
Research & Development
961 
Sales
446 
Human Resources
 
63
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length22
Median length22
Mean length16.542177
Min length5

Characters and Unicode

Total characters24317
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSales
2nd rowResearch & Development
3rd rowResearch & Development
4th rowResearch & Development
5th rowResearch & Development

Common Values

ValueCountFrequency (%)
Research & Development 961
65.4%
Sales 446
30.3%
Human Resources 63
 
4.3%

Length

2023-07-14T01:12:38.314764image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:38.500266image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
research 961
27.8%
961
27.8%
development 961
27.8%
sales 446
12.9%
human 63
 
1.8%
resources 63
 
1.8%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
e 5377
22.1%
1985
 
8.2%
s 1533
 
6.3%
a 1470
 
6.0%
l 1407
 
5.8%
R 1024
 
4.2%
r 1024
 
4.2%
c 1024
 
4.2%
n 1024
 
4.2%
m 1024
 
4.2%
Other values (10) 7425
30.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18877
77.6%
Uppercase Letter 2494
 
10.3%
Space Separator 1985
 
8.2%
Other Punctuation 961
 
4.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 5377
28.5%
s 1533
 
8.1%
a 1470
 
7.8%
l 1407
 
7.5%
r 1024
 
5.4%
c 1024
 
5.4%
n 1024
 
5.4%
m 1024
 
5.4%
o 1024
 
5.4%
p 961
 
5.1%
Other values (4) 3009
15.9%
Uppercase Letter
ValueCountFrequency (%)
R 1024
41.1%
D 961
38.5%
S 446
17.9%
H 63
 
2.5%
Space Separator
ValueCountFrequency (%)
1985
100.0%
Other Punctuation
ValueCountFrequency (%)
& 961
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 21371
87.9%
Common 2946
 
12.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 5377
25.2%
s 1533
 
7.2%
a 1470
 
6.9%
l 1407
 
6.6%
R 1024
 
4.8%
r 1024
 
4.8%
c 1024
 
4.8%
n 1024
 
4.8%
m 1024
 
4.8%
o 1024
 
4.8%
Other values (8) 5440
25.5%
Common
ValueCountFrequency (%)
1985
67.4%
& 961
32.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24317
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 5377
22.1%
1985
 
8.2%
s 1533
 
6.3%
a 1470
 
6.0%
l 1407
 
5.8%
R 1024
 
4.2%
r 1024
 
4.2%
c 1024
 
4.2%
n 1024
 
4.2%
m 1024
 
4.2%
Other values (10) 7425
30.5%

DistanceFromHome
Real number (ℝ)

Distinct29
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.192517
Minimum1
Maximum29
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:38.659308image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum1
5-th percentile1
Q12
median7
Q314
95-th percentile26
Maximum29
Range28
Interquartile range (IQR)12

Descriptive statistics

Standard deviation8.1068644
Coefficient of variation (CV)0.88189823
Kurtosis-0.2248334
Mean9.192517
Median Absolute Deviation (MAD)5
Skewness0.958118
Sum13513
Variance65.721251
MonotonicityNot monotonic
2023-07-14T01:12:38.821699image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
2 211
14.4%
1 208
14.1%
10 86
 
5.9%
9 85
 
5.8%
3 84
 
5.7%
7 84
 
5.7%
8 80
 
5.4%
5 65
 
4.4%
4 64
 
4.4%
6 59
 
4.0%
Other values (19) 444
30.2%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
1 208
14.1%
2 211
14.4%
3 84
 
5.7%
4 64
 
4.4%
5 65
 
4.4%
6 59
 
4.0%
7 84
 
5.7%
8 80
 
5.4%
9 85
5.8%
10 86
5.9%
ValueCountFrequency (%)
29 27
1.8%
28 23
1.6%
27 12
0.8%
26 25
1.7%
25 25
1.7%
24 28
1.9%
23 27
1.8%
22 19
1.3%
21 18
1.2%
20 25
1.7%

Education
Categorical

Distinct5
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
3
572 
4
398 
2
282 
1
170 
5
 
48
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row2
4th row4
5th row1

Common Values

ValueCountFrequency (%)
3 572
38.9%
4 398
27.1%
2 282
19.2%
1 170
 
11.6%
5 48
 
3.3%

Length

2023-07-14T01:12:38.993243image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:39.193706image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 572
38.9%
4 398
27.1%
2 282
19.2%
1 170
 
11.6%
5 48
 
3.3%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
3 572
38.9%
4 398
27.1%
2 282
19.2%
1 170
 
11.6%
5 48
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 572
38.9%
4 398
27.1%
2 282
19.2%
1 170
 
11.6%
5 48
 
3.3%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 572
38.9%
4 398
27.1%
2 282
19.2%
1 170
 
11.6%
5 48
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 572
38.9%
4 398
27.1%
2 282
19.2%
1 170
 
11.6%
5 48
 
3.3%

EducationField
Categorical

Distinct6
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
Life Sciences
606 
Medical
464 
Marketing
159 
Technical Degree
132 
Other
82 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length16
Median length15
Mean length10.533333
Min length5

Characters and Unicode

Total characters15484
Distinct characters26
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLife Sciences
2nd rowLife Sciences
3rd rowOther
4th rowLife Sciences
5th rowMedical

Common Values

ValueCountFrequency (%)
Life Sciences 606
41.2%
Medical 464
31.6%
Marketing 159
 
10.8%
Technical Degree 132
 
9.0%
Other 82
 
5.6%
Human Resources 27
 
1.8%

Length

2023-07-14T01:12:39.405139image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:39.634638image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
life 606
27.1%
sciences 606
27.1%
medical 464
20.8%
marketing 159
 
7.1%
technical 132
 
5.9%
degree 132
 
5.9%
other 82
 
3.7%
human 27
 
1.2%
resources 27
 
1.2%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
e 3105
20.1%
i 1967
12.7%
c 1967
12.7%
n 924
 
6.0%
a 782
 
5.1%
765
 
4.9%
s 660
 
4.3%
M 623
 
4.0%
L 606
 
3.9%
f 606
 
3.9%
Other values (16) 3479
22.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12484
80.6%
Uppercase Letter 2235
 
14.4%
Space Separator 765
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3105
24.9%
i 1967
15.8%
c 1967
15.8%
n 924
 
7.4%
a 782
 
6.3%
s 660
 
5.3%
f 606
 
4.9%
l 596
 
4.8%
d 464
 
3.7%
r 400
 
3.2%
Other values (7) 1013
 
8.1%
Uppercase Letter
ValueCountFrequency (%)
M 623
27.9%
L 606
27.1%
S 606
27.1%
T 132
 
5.9%
D 132
 
5.9%
O 82
 
3.7%
H 27
 
1.2%
R 27
 
1.2%
Space Separator
ValueCountFrequency (%)
765
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14719
95.1%
Common 765
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 3105
21.1%
i 1967
13.4%
c 1967
13.4%
n 924
 
6.3%
a 782
 
5.3%
s 660
 
4.5%
M 623
 
4.2%
L 606
 
4.1%
f 606
 
4.1%
S 606
 
4.1%
Other values (15) 2873
19.5%
Common
ValueCountFrequency (%)
765
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15484
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 3105
20.1%
i 1967
12.7%
c 1967
12.7%
n 924
 
6.0%
a 782
 
5.1%
765
 
4.9%
s 660
 
4.3%
M 623
 
4.0%
L 606
 
3.9%
f 606
 
3.9%
Other values (16) 3479
22.5%

EnvironmentSatisfaction
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
3
453 
4
446 
2
287 
1
284 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row3
3rd row4
4th row4
5th row1

Common Values

ValueCountFrequency (%)
3 453
30.8%
4 446
30.3%
2 287
19.5%
1 284
19.3%

Length

2023-07-14T01:12:39.863676image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:40.064178image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 453
30.8%
4 446
30.3%
2 287
19.5%
1 284
19.3%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
3 453
30.8%
4 446
30.3%
2 287
19.5%
1 284
19.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 453
30.8%
4 446
30.3%
2 287
19.5%
1 284
19.3%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 453
30.8%
4 446
30.3%
2 287
19.5%
1 284
19.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 453
30.8%
4 446
30.3%
2 287
19.5%
1 284
19.3%

Gender
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
Male
882 
Female
588 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length6
Median length4
Mean length4.8
Min length4

Characters and Unicode

Total characters7056
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowMale
3rd rowMale
4th rowFemale
5th rowMale

Common Values

ValueCountFrequency (%)
Male 882
60.0%
Female 588
40.0%

Length

2023-07-14T01:12:40.310388image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:40.841965image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
male 882
60.0%
female 588
40.0%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
e 2058
29.2%
a 1470
20.8%
l 1470
20.8%
M 882
12.5%
F 588
 
8.3%
m 588
 
8.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5586
79.2%
Uppercase Letter 1470
 
20.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2058
36.8%
a 1470
26.3%
l 1470
26.3%
m 588
 
10.5%
Uppercase Letter
ValueCountFrequency (%)
M 882
60.0%
F 588
40.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7056
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2058
29.2%
a 1470
20.8%
l 1470
20.8%
M 882
12.5%
F 588
 
8.3%
m 588
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7056
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 2058
29.2%
a 1470
20.8%
l 1470
20.8%
M 882
12.5%
F 588
 
8.3%
m 588
 
8.3%

HourlyRate
Real number (ℝ)

Distinct71
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65.891156
Minimum30
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:41.245888image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum30
5-th percentile33
Q148
median66
Q383.75
95-th percentile97
Maximum100
Range70
Interquartile range (IQR)35.75

Descriptive statistics

Standard deviation20.329428
Coefficient of variation (CV)0.30853044
Kurtosis-1.1963985
Mean65.891156
Median Absolute Deviation (MAD)18
Skewness-0.032310953
Sum96860
Variance413.28563
MonotonicityNot monotonic
2023-07-14T01:12:41.448343image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
66 29
 
2.0%
98 28
 
1.9%
42 28
 
1.9%
48 28
 
1.9%
84 28
 
1.9%
57 27
 
1.8%
79 27
 
1.8%
96 27
 
1.8%
54 26
 
1.8%
52 26
 
1.8%
Other values (61) 1196
81.4%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
30 19
1.3%
31 15
1.0%
32 24
1.6%
33 19
1.3%
34 12
0.8%
35 18
1.2%
36 18
1.2%
37 18
1.2%
38 13
0.9%
39 17
1.2%
ValueCountFrequency (%)
100 19
1.3%
99 20
1.4%
98 28
1.9%
97 21
1.4%
96 27
1.8%
95 23
1.6%
94 22
1.5%
93 16
1.1%
92 25
1.7%
91 18
1.2%

JobInvolvement
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
3
868 
2
375 
4
144 
1
 
83
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row2
3rd row2
4th row3
5th row3

Common Values

ValueCountFrequency (%)
3 868
59.0%
2 375
25.5%
4 144
 
9.8%
1 83
 
5.6%

Length

2023-07-14T01:12:41.642824image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:41.826355image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 868
59.0%
2 375
25.5%
4 144
 
9.8%
1 83
 
5.6%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
3 868
59.0%
2 375
25.5%
4 144
 
9.8%
1 83
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 868
59.0%
2 375
25.5%
4 144
 
9.8%
1 83
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 868
59.0%
2 375
25.5%
4 144
 
9.8%
1 83
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 868
59.0%
2 375
25.5%
4 144
 
9.8%
1 83
 
5.6%

JobLevel
Categorical

Distinct5
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
1
543 
2
534 
3
218 
4
106 
5
69 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 543
36.9%
2 534
36.3%
3 218
14.8%
4 106
 
7.2%
5 69
 
4.7%

Length

2023-07-14T01:12:42.025798image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:42.234245image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1 543
36.9%
2 534
36.3%
3 218
14.8%
4 106
 
7.2%
5 69
 
4.7%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
1 543
36.9%
2 534
36.3%
3 218
14.8%
4 106
 
7.2%
5 69
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 543
36.9%
2 534
36.3%
3 218
14.8%
4 106
 
7.2%
5 69
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 543
36.9%
2 534
36.3%
3 218
14.8%
4 106
 
7.2%
5 69
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 543
36.9%
2 534
36.3%
3 218
14.8%
4 106
 
7.2%
5 69
 
4.7%

JobRole
Categorical

Distinct9
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
Sales Executive
326 
Research Scientist
292 
Laboratory Technician
259 
Manufacturing Director
145 
Healthcare Representative
131 
Other values (4)
317 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length25
Median length21
Mean length18.070748
Min length7

Characters and Unicode

Total characters26564
Distinct characters29
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSales Executive
2nd rowResearch Scientist
3rd rowLaboratory Technician
4th rowResearch Scientist
5th rowLaboratory Technician

Common Values

ValueCountFrequency (%)
Sales Executive 326
22.2%
Research Scientist 292
19.9%
Laboratory Technician 259
17.6%
Manufacturing Director 145
9.9%
Healthcare Representative 131
8.9%
Manager 102
 
6.9%
Sales Representative 83
 
5.6%
Research Director 80
 
5.4%
Human Resources 52
 
3.5%

Length

2023-07-14T01:12:42.562364image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:42.942346image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
sales 409
14.4%
research 372
13.1%
executive 326
11.5%
scientist 292
10.3%
laboratory 259
9.1%
technician 259
9.1%
director 225
7.9%
representative 214
7.5%
manufacturing 145
 
5.1%
healthcare 131
 
4.6%
Other values (3) 206
7.3%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
e 3905
14.7%
a 2580
 
9.7%
t 2098
 
7.9%
c 2061
 
7.8%
i 2012
 
7.6%
r 1984
 
7.5%
n 1468
 
5.5%
s 1391
 
5.2%
1368
 
5.1%
o 795
 
3.0%
Other values (19) 6902
26.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 22358
84.2%
Uppercase Letter 2838
 
10.7%
Space Separator 1368
 
5.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3905
17.5%
a 2580
11.5%
t 2098
9.4%
c 2061
9.2%
i 2012
9.0%
r 1984
8.9%
n 1468
 
6.6%
s 1391
 
6.2%
o 795
 
3.6%
h 762
 
3.4%
Other values (10) 3302
14.8%
Uppercase Letter
ValueCountFrequency (%)
S 701
24.7%
R 638
22.5%
E 326
11.5%
L 259
 
9.1%
T 259
 
9.1%
M 247
 
8.7%
D 225
 
7.9%
H 183
 
6.4%
Space Separator
ValueCountFrequency (%)
1368
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 25196
94.9%
Common 1368
 
5.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 3905
15.5%
a 2580
10.2%
t 2098
 
8.3%
c 2061
 
8.2%
i 2012
 
8.0%
r 1984
 
7.9%
n 1468
 
5.8%
s 1391
 
5.5%
o 795
 
3.2%
h 762
 
3.0%
Other values (18) 6140
24.4%
Common
ValueCountFrequency (%)
1368
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26564
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 3905
14.7%
a 2580
 
9.7%
t 2098
 
7.9%
c 2061
 
7.8%
i 2012
 
7.6%
r 1984
 
7.5%
n 1468
 
5.5%
s 1391
 
5.2%
1368
 
5.1%
o 795
 
3.0%
Other values (19) 6902
26.0%

JobSatisfaction
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
4
459 
3
442 
1
289 
2
280 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row2
3rd row3
4th row3
5th row2

Common Values

ValueCountFrequency (%)
4 459
31.2%
3 442
30.1%
1 289
19.7%
2 280
19.0%

Length

2023-07-14T01:12:43.410096image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:43.731237image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
4 459
31.2%
3 442
30.1%
1 289
19.7%
2 280
19.0%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
4 459
31.2%
3 442
30.1%
1 289
19.7%
2 280
19.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 459
31.2%
3 442
30.1%
1 289
19.7%
2 280
19.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4 459
31.2%
3 442
30.1%
1 289
19.7%
2 280
19.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 459
31.2%
3 442
30.1%
1 289
19.7%
2 280
19.0%

MaritalStatus
Categorical

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
Married
673 
Single
470 
Divorced
327 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length8
Median length7
Mean length6.9027211
Min length6

Characters and Unicode

Total characters10147
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSingle
2nd rowMarried
3rd rowSingle
4th rowMarried
5th rowMarried

Common Values

ValueCountFrequency (%)
Married 673
45.8%
Single 470
32.0%
Divorced 327
22.2%

Length

2023-07-14T01:12:44.175051image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:44.390472image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
married 673
45.8%
single 470
32.0%
divorced 327
22.2%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
r 1673
16.5%
i 1470
14.5%
e 1470
14.5%
d 1000
9.9%
M 673
6.6%
a 673
6.6%
S 470
 
4.6%
n 470
 
4.6%
g 470
 
4.6%
l 470
 
4.6%
Other values (4) 1308
12.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8677
85.5%
Uppercase Letter 1470
 
14.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 1673
19.3%
i 1470
16.9%
e 1470
16.9%
d 1000
11.5%
a 673
7.8%
n 470
 
5.4%
g 470
 
5.4%
l 470
 
5.4%
v 327
 
3.8%
o 327
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
M 673
45.8%
S 470
32.0%
D 327
22.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 10147
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 1673
16.5%
i 1470
14.5%
e 1470
14.5%
d 1000
9.9%
M 673
6.6%
a 673
6.6%
S 470
 
4.6%
n 470
 
4.6%
g 470
 
4.6%
l 470
 
4.6%
Other values (4) 1308
12.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10147
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 1673
16.5%
i 1470
14.5%
e 1470
14.5%
d 1000
9.9%
M 673
6.6%
a 673
6.6%
S 470
 
4.6%
n 470
 
4.6%
g 470
 
4.6%
l 470
 
4.6%
Other values (4) 1308
12.9%

MonthlyIncome
Real number (ℝ)

Distinct1349
Distinct (%)91.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6502.9313
Minimum1009
Maximum19999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:44.635817image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum1009
5-th percentile2097.9
Q12911
median4919
Q38379
95-th percentile17821.35
Maximum19999
Range18990
Interquartile range (IQR)5468

Descriptive statistics

Standard deviation4707.9568
Coefficient of variation (CV)0.72397455
Kurtosis1.0052327
Mean6502.9313
Median Absolute Deviation (MAD)2199
Skewness1.3698167
Sum9559309
Variance22164857
MonotonicityNot monotonic
2023-07-14T01:12:44.860217image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2342 4
 
0.3%
6142 3
 
0.2%
2741 3
 
0.2%
2559 3
 
0.2%
2610 3
 
0.2%
2451 3
 
0.2%
5562 3
 
0.2%
3452 3
 
0.2%
2380 3
 
0.2%
6347 3
 
0.2%
Other values (1339) 1439
97.9%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
1009 1
0.1%
1051 1
0.1%
1052 1
0.1%
1081 1
0.1%
1091 1
0.1%
1102 1
0.1%
1118 1
0.1%
1129 1
0.1%
1200 1
0.1%
1223 1
0.1%
ValueCountFrequency (%)
19999 1
0.1%
19973 1
0.1%
19943 1
0.1%
19926 1
0.1%
19859 1
0.1%
19847 1
0.1%
19845 1
0.1%
19833 1
0.1%
19740 1
0.1%
19717 1
0.1%

MonthlyRate
Real number (ℝ)

Distinct1427
Distinct (%)97.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14313.103
Minimum2094
Maximum26999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:45.082621image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum2094
5-th percentile3384.55
Q18047
median14235.5
Q320461.5
95-th percentile25431.9
Maximum26999
Range24905
Interquartile range (IQR)12414.5

Descriptive statistics

Standard deviation7117.786
Coefficient of variation (CV)0.4972916
Kurtosis-1.2149561
Mean14313.103
Median Absolute Deviation (MAD)6206.5
Skewness0.018577808
Sum21040262
Variance50662878
MonotonicityNot monotonic
2023-07-14T01:12:45.303008image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4223 3
 
0.2%
9150 3
 
0.2%
9558 2
 
0.1%
12858 2
 
0.1%
22074 2
 
0.1%
25326 2
 
0.1%
9096 2
 
0.1%
13008 2
 
0.1%
12355 2
 
0.1%
7744 2
 
0.1%
Other values (1417) 1448
98.5%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
2094 1
0.1%
2097 1
0.1%
2104 1
0.1%
2112 1
0.1%
2122 1
0.1%
2125 2
0.1%
2137 1
0.1%
2227 1
0.1%
2243 1
0.1%
2253 1
0.1%
ValueCountFrequency (%)
26999 1
0.1%
26997 1
0.1%
26968 1
0.1%
26959 1
0.1%
26956 1
0.1%
26933 1
0.1%
26914 1
0.1%
26897 1
0.1%
26894 1
0.1%
26862 1
0.1%

NumCompaniesWorked
Real number (ℝ)

Distinct10
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.6931973
Minimum0
Maximum9
Zeros197
Zeros (%)13.4%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:45.501476image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile8
Maximum9
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.498009
Coefficient of variation (CV)0.92752545
Kurtosis0.010213817
Mean2.6931973
Median Absolute Deviation (MAD)1
Skewness1.0264711
Sum3959
Variance6.240049
MonotonicityNot monotonic
2023-07-14T01:12:45.740837image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 521
35.4%
0 197
 
13.4%
3 159
 
10.8%
2 146
 
9.9%
4 139
 
9.5%
7 74
 
5.0%
6 70
 
4.8%
5 63
 
4.3%
9 52
 
3.5%
8 49
 
3.3%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
0 197
 
13.4%
1 521
35.4%
2 146
 
9.9%
3 159
 
10.8%
4 139
 
9.5%
5 63
 
4.3%
6 70
 
4.8%
7 74
 
5.0%
8 49
 
3.3%
9 52
 
3.5%
ValueCountFrequency (%)
9 52
 
3.5%
8 49
 
3.3%
7 74
 
5.0%
6 70
 
4.8%
5 63
 
4.3%
4 139
 
9.5%
3 159
 
10.8%
2 146
 
9.9%
1 521
35.4%
0 197
 
13.4%

OverTime
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
False
1054 
True
416 
  • Common Values (Table)
  • Common Values (Plot)
ValueCountFrequency (%)
False 1054
71.7%
True 416
 
28.3%
2023-07-14T01:12:46.102868image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

PercentSalaryHike
Real number (ℝ)

Distinct15
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.209524
Minimum11
Maximum25
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:46.312307image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum11
5-th percentile11
Q112
median14
Q318
95-th percentile22
Maximum25
Range14
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.6599377
Coefficient of variation (CV)0.2406346
Kurtosis-0.30059822
Mean15.209524
Median Absolute Deviation (MAD)2
Skewness0.82112798
Sum22358
Variance13.395144
MonotonicityNot monotonic
2023-07-14T01:12:46.456922image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
11 210
14.3%
13 209
14.2%
14 201
13.7%
12 198
13.5%
15 101
6.9%
18 89
6.1%
17 82
 
5.6%
16 78
 
5.3%
19 76
 
5.2%
22 56
 
3.8%
Other values (5) 170
11.6%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
11 210
14.3%
12 198
13.5%
13 209
14.2%
14 201
13.7%
15 101
6.9%
16 78
 
5.3%
17 82
 
5.6%
18 89
6.1%
19 76
 
5.2%
20 55
 
3.7%
ValueCountFrequency (%)
25 18
 
1.2%
24 21
 
1.4%
23 28
 
1.9%
22 56
3.8%
21 48
3.3%
20 55
3.7%
19 76
5.2%
18 89
6.1%
17 82
5.6%
16 78
5.3%

PerformanceRating
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
3
1244 
4
226 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row4
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
3 1244
84.6%
4 226
 
15.4%

Length

2023-07-14T01:12:46.625737image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:46.816991image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 1244
84.6%
4 226
 
15.4%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
3 1244
84.6%
4 226
 
15.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 1244
84.6%
4 226
 
15.4%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 1244
84.6%
4 226
 
15.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 1244
84.6%
4 226
 
15.4%

RelationshipSatisfaction
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
3
459 
4
432 
2
303 
1
276 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row4
3rd row2
4th row3
5th row4

Common Values

ValueCountFrequency (%)
3 459
31.2%
4 432
29.4%
2 303
20.6%
1 276
18.8%

Length

2023-07-14T01:12:46.961582image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:47.613836image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 459
31.2%
4 432
29.4%
2 303
20.6%
1 276
18.8%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
3 459
31.2%
4 432
29.4%
2 303
20.6%
1 276
18.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 459
31.2%
4 432
29.4%
2 303
20.6%
1 276
18.8%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 459
31.2%
4 432
29.4%
2 303
20.6%
1 276
18.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 459
31.2%
4 432
29.4%
2 303
20.6%
1 276
18.8%

StockOptionLevel
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
0
631 
1
596 
2
158 
3
85 
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0 631
42.9%
1 596
40.5%
2 158
 
10.7%
3 85
 
5.8%

Length

2023-07-14T01:12:47.805326image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:47.992360image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0 631
42.9%
1 596
40.5%
2 158
 
10.7%
3 85
 
5.8%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
0 631
42.9%
1 596
40.5%
2 158
 
10.7%
3 85
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 631
42.9%
1 596
40.5%
2 158
 
10.7%
3 85
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 631
42.9%
1 596
40.5%
2 158
 
10.7%
3 85
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 631
42.9%
1 596
40.5%
2 158
 
10.7%
3 85
 
5.8%

TotalWorkingYears
Real number (ℝ)

Distinct40
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.279592
Minimum0
Maximum40
Zeros11
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:48.166869image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum0
5-th percentile1
Q16
median10
Q315
95-th percentile28
Maximum40
Range40
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.7807817
Coefficient of variation (CV)0.68981057
Kurtosis0.91826954
Mean11.279592
Median Absolute Deviation (MAD)4
Skewness1.1171719
Sum16581
Variance60.540563
MonotonicityNot monotonic
2023-07-14T01:12:48.353914image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
10 202
 
13.7%
6 125
 
8.5%
8 103
 
7.0%
9 96
 
6.5%
5 88
 
6.0%
7 81
 
5.5%
1 81
 
5.5%
4 63
 
4.3%
12 48
 
3.3%
3 42
 
2.9%
Other values (30) 541
36.8%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
0 11
 
0.7%
1 81
5.5%
2 31
 
2.1%
3 42
 
2.9%
4 63
4.3%
5 88
6.0%
6 125
8.5%
7 81
5.5%
8 103
7.0%
9 96
6.5%
ValueCountFrequency (%)
40 2
 
0.1%
38 1
 
0.1%
37 4
0.3%
36 6
0.4%
35 3
 
0.2%
34 5
0.3%
33 7
0.5%
32 9
0.6%
31 9
0.6%
30 7
0.5%

TrainingTimesLastYear
Real number (ℝ)

Distinct7
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.7993197
Minimum0
Maximum6
Zeros54
Zeros (%)3.7%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:48.520064image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.2892706
Coefficient of variation (CV)0.46056569
Kurtosis0.49499299
Mean2.7993197
Median Absolute Deviation (MAD)1
Skewness0.55312417
Sum4115
Variance1.6622187
MonotonicityNot monotonic
2023-07-14T01:12:48.656473image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2 547
37.2%
3 491
33.4%
4 123
 
8.4%
5 119
 
8.1%
1 71
 
4.8%
6 65
 
4.4%
0 54
 
3.7%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
0 54
 
3.7%
1 71
 
4.8%
2 547
37.2%
3 491
33.4%
4 123
 
8.4%
5 119
 
8.1%
6 65
 
4.4%
ValueCountFrequency (%)
6 65
 
4.4%
5 119
 
8.1%
4 123
 
8.4%
3 491
33.4%
2 547
37.2%
1 71
 
4.8%
0 54
 
3.7%

WorkLifeBalance
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
3
893 
2
344 
4
153 
1
 
80
  • Overview
  • Categories
  • Words
  • Characters

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1470
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row3
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
3 893
60.7%
2 344
 
23.4%
4 153
 
10.4%
1 80
 
5.4%

Length

2023-07-14T01:12:48.817783image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-07-14T01:12:48.998924image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 893
60.7%
2 344
 
23.4%
4 153
 
10.4%
1 80
 
5.4%
  • Characters
  • Categories
  • Scripts
  • Blocks

Most occurring characters

ValueCountFrequency (%)
3 893
60.7%
2 344
 
23.4%
4 153
 
10.4%
1 80
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1470
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 893
60.7%
2 344
 
23.4%
4 153
 
10.4%
1 80
 
5.4%

Most occurring scripts

ValueCountFrequency (%)
Common 1470
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 893
60.7%
2 344
 
23.4%
4 153
 
10.4%
1 80
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1470
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 893
60.7%
2 344
 
23.4%
4 153
 
10.4%
1 80
 
5.4%

YearsAtCompany
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct37
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.0081633
Minimum0
Maximum40
Zeros44
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:49.169008image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q39
95-th percentile20
Maximum40
Range40
Interquartile range (IQR)6

Descriptive statistics

Standard deviation6.1265252
Coefficient of variation (CV)0.87419841
Kurtosis3.9355088
Mean7.0081633
Median Absolute Deviation (MAD)3
Skewness1.7645295
Sum10302
Variance37.53431
MonotonicityNot monotonic
2023-07-14T01:12:49.348386image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
5 196
13.3%
1 171
11.6%
3 128
8.7%
2 127
8.6%
10 120
8.2%
4 110
 
7.5%
7 90
 
6.1%
9 82
 
5.6%
8 80
 
5.4%
6 76
 
5.2%
Other values (27) 290
19.7%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
0 44
 
3.0%
1 171
11.6%
2 127
8.6%
3 128
8.7%
4 110
7.5%
5 196
13.3%
6 76
 
5.2%
7 90
6.1%
8 80
5.4%
9 82
5.6%
ValueCountFrequency (%)
40 1
 
0.1%
37 1
 
0.1%
36 2
 
0.1%
34 1
 
0.1%
33 5
0.3%
32 3
0.2%
31 3
0.2%
30 1
 
0.1%
29 2
 
0.1%
27 2
 
0.1%

YearsInCurrentRole
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct19
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.2292517
Minimum0
Maximum18
Zeros244
Zeros (%)16.6%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:49.533056image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q37
95-th percentile11
Maximum18
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.623137
Coefficient of variation (CV)0.85668513
Kurtosis0.47742077
Mean4.2292517
Median Absolute Deviation (MAD)3
Skewness0.91736316
Sum6217
Variance13.127122
MonotonicityNot monotonic
2023-07-14T01:12:49.700124image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
2 372
25.3%
0 244
16.6%
7 222
15.1%
3 135
 
9.2%
4 104
 
7.1%
8 89
 
6.1%
9 67
 
4.6%
1 57
 
3.9%
6 37
 
2.5%
5 36
 
2.4%
Other values (9) 107
 
7.3%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
0 244
16.6%
1 57
 
3.9%
2 372
25.3%
3 135
 
9.2%
4 104
 
7.1%
5 36
 
2.4%
6 37
 
2.5%
7 222
15.1%
8 89
 
6.1%
9 67
 
4.6%
ValueCountFrequency (%)
18 2
 
0.1%
17 4
 
0.3%
16 7
 
0.5%
15 8
 
0.5%
14 11
 
0.7%
13 14
 
1.0%
12 10
 
0.7%
11 22
 
1.5%
10 29
2.0%
9 67
4.6%

YearsSinceLastPromotion
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct16
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.1877551
Minimum0
Maximum15
Zeros581
Zeros (%)39.5%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:49.888622image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile9
Maximum15
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.2224303
Coefficient of variation (CV)1.4729392
Kurtosis3.6126731
Mean2.1877551
Median Absolute Deviation (MAD)1
Skewness1.98429
Sum3216
Variance10.384057
MonotonicityNot monotonic
2023-07-14T01:12:50.058550image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
0 581
39.5%
1 357
24.3%
2 159
 
10.8%
7 76
 
5.2%
4 61
 
4.1%
3 52
 
3.5%
5 45
 
3.1%
6 32
 
2.2%
11 24
 
1.6%
8 18
 
1.2%
Other values (6) 65
 
4.4%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
0 581
39.5%
1 357
24.3%
2 159
 
10.8%
3 52
 
3.5%
4 61
 
4.1%
5 45
 
3.1%
6 32
 
2.2%
7 76
 
5.2%
8 18
 
1.2%
9 17
 
1.2%
ValueCountFrequency (%)
15 13
 
0.9%
14 9
 
0.6%
13 10
 
0.7%
12 10
 
0.7%
11 24
 
1.6%
10 6
 
0.4%
9 17
 
1.2%
8 18
 
1.2%
7 76
5.2%
6 32
2.2%

YearsWithCurrManager
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct18
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.1231293
Minimum0
Maximum17
Zeros263
Zeros (%)17.9%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:50.236936image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q37
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.5681361
Coefficient of variation (CV)0.86539517
Kurtosis0.17105808
Mean4.1231293
Median Absolute Deviation (MAD)3
Skewness0.83345099
Sum6061
Variance12.731595
MonotonicityNot monotonic
2023-07-14T01:12:50.419449image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
2 344
23.4%
0 263
17.9%
7 216
14.7%
3 142
9.7%
8 107
 
7.3%
4 98
 
6.7%
1 76
 
5.2%
9 64
 
4.4%
5 31
 
2.1%
6 29
 
2.0%
Other values (8) 100
 
6.8%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
0 263
17.9%
1 76
 
5.2%
2 344
23.4%
3 142
9.7%
4 98
 
6.7%
5 31
 
2.1%
6 29
 
2.0%
7 216
14.7%
8 107
 
7.3%
9 64
 
4.4%
ValueCountFrequency (%)
17 7
 
0.5%
16 2
 
0.1%
15 5
 
0.3%
14 5
 
0.3%
13 14
 
1.0%
12 18
 
1.2%
11 22
 
1.5%
10 27
 
1.8%
9 64
4.4%
8 107
7.3%

Fecha
Real number (ℝ)

Distinct37
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2010.9918
Minimum1978
Maximum2018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.6 KiB
2023-07-14T01:12:50.642852image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
  • Statistics
  • Histogram
  • Common values
  • Extreme values

Quantile statistics

Minimum1978
5-th percentile1998
Q12009
median2013
Q32015
95-th percentile2017
Maximum2018
Range40
Interquartile range (IQR)6

Descriptive statistics

Standard deviation6.1265252
Coefficient of variation (CV)0.0030465192
Kurtosis3.9355088
Mean2010.9918
Median Absolute Deviation (MAD)3
Skewness-1.7645295
Sum2956158
Variance37.53431
MonotonicityNot monotonic
2023-07-14T01:12:50.836334image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
2013 196
13.3%
2017 171
11.6%
2015 128
8.7%
2016 127
8.6%
2008 120
8.2%
2014 110
 
7.5%
2011 90
 
6.1%
2009 82
 
5.6%
2010 80
 
5.4%
2012 76
 
5.2%
Other values (27) 290
19.7%
  • Minimum 10 values
  • Maximum 10 values
ValueCountFrequency (%)
1978 1
 
0.1%
1981 1
 
0.1%
1982 2
 
0.1%
1984 1
 
0.1%
1985 5
0.3%
1986 3
0.2%
1987 3
0.2%
1988 1
 
0.1%
1989 2
 
0.1%
1991 2
 
0.1%
ValueCountFrequency (%)
2018 44
 
3.0%
2017 171
11.6%
2016 127
8.6%
2015 128
8.7%
2014 110
7.5%
2013 196
13.3%
2012 76
 
5.2%
2011 90
6.1%
2010 80
5.4%
2009 82
5.6%

Interactions

2023-07-14T01:12:32.375851image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:42.000649image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:45.841708image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:50.051846image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:54.033185image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:57.178813image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:00.372878image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:03.333706image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:06.885224image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:10.048175image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:13.131603image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:17.448385image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:21.589250image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:25.059276image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:28.226204image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:32.576287image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:42.267936image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:46.161850image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:50.664207image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:54.265563image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:57.350354image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:00.538436image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:03.500757image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:07.089678image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:10.263830image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:13.308099image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:17.631897image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:21.762782image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:25.223639image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:28.396747image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:32.796697image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:42.442478image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:46.661514image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:51.117995image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:54.567754image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:57.536855image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:00.730919image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:03.668309image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:07.335022image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:10.497205image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:13.506599image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:17.811354image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:21.947292image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:25.399698image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:28.578263image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:33.010126image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:42.634954image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:46.902869image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:51.323445image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:54.842018image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:57.732337image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:00.921409image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:04.003408image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:07.556429image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:10.688692image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:13.705037image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:17.989877image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:22.214574image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:25.574236image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:28.754789image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:33.201220image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:42.809488image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:47.156190image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:51.548843image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:55.073403image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:57.924845image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:01.119880image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:04.409323image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:07.771856image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:10.877373image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:14.021223image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:18.176378image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:22.652404image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:25.756743image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:28.937303image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:33.425648image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:42.991332image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:47.376602image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:51.833081image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:55.271935image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:58.115309image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:01.305384image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:04.617765image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:08.060081image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:11.082823image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:14.494956image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:18.471586image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:22.850874image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:25.972167image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:29.164692image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:33.633065image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:43.165867image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:47.547154image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:52.006620image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:55.439487image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:58.289842image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:01.461963image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:04.827214image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:08.300438image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:11.287276image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:14.654768image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:18.884483image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:23.158049image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:26.164652image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:29.363883image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:33.812585image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:43.323443image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:47.717490image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:52.185139image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:55.614021image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:58.472498image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:01.624529image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:05.016698image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:08.456023image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:11.463807image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:14.815340image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:19.334278image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:23.524072image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:26.351152image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:30.443990image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:33.998577image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:43.490001image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:47.904989image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:52.377935image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:55.802516image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:58.875420image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:01.801841image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:05.230129image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:08.629355image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:11.676244image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:14.989680image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:19.720246image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:23.725533image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:26.549620image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:30.800038image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:34.201034image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:43.678493image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:48.096476image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:52.596352image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:56.001960image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:59.106801image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:02.006326image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:05.417151image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:08.840276image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:11.868720image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:15.170480image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:20.117184image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:23.918019image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:26.765046image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:31.064332image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:34.463334image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:43.856018image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:48.278987image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:52.785767image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:56.191454image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:59.314810image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:02.205761image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:05.656511image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:09.083624image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:12.072177image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:15.341028image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:20.366521image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:24.091864image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:26.945629image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:31.309675image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:34.655819image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:44.085404image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:48.476485image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:52.987981image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:56.382943image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:59.530233image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:02.387274image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:05.847003image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:09.266447image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:12.290594image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:15.519545image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:20.580943image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:24.284350image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:27.145095image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:31.508146image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:34.943051image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:45.299158image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:48.681911image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:53.203403image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:56.589390image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:59.744662image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:02.569788image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:06.073396image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:09.449927image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:12.509300image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:15.918478image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:20.821301image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:24.477831image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:27.357527image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:31.739523image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:35.144511image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:45.477682image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:48.889354image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:53.410858image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:56.784867image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:59.958984image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:02.964693image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:06.471333image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:09.627301image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:12.729712image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:16.484962image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:21.076619image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:24.669319image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:27.698615image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:31.935002image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:35.332676image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:45.654208image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:49.173198image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:53.644223image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:11:56.977351image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:00.158451image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:03.150198image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:06.703710image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:09.811810image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:12.919171image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:17.195063image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:21.311988image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:24.863826image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:27.949943image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-07-14T01:12:32.123497image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

  • Auto
  • Heatmap
  • Table
2023-07-14T01:12:51.118994image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
AgeDailyRateDistanceFromHomeHourlyRateMonthlyIncomeMonthlyRateNumCompaniesWorkedPercentSalaryHikeTotalWorkingYearsTrainingTimesLastYearYearsAtCompanyYearsInCurrentRoleYearsSinceLastPromotionYearsWithCurrManagerFechaAttritionBusinessTravelDepartmentEducationEducationFieldEnvironmentSatisfactionGenderJobInvolvementJobLevelJobRoleJobSatisfactionMaritalStatusOverTimePerformanceRatingRelationshipSatisfactionStockOptionLevelWorkLifeBalance
Age1.0000.007-0.0190.0290.4720.0170.3530.0080.6570.0000.2520.1980.1740.195-0.2520.2130.0410.0000.1530.0000.0060.0000.0250.2950.1750.0000.1410.0000.0000.0350.0930.033
DailyRate0.0071.000-0.0030.0240.016-0.0320.0370.0250.021-0.011-0.0100.007-0.038-0.0050.0100.0620.0290.0000.0170.0390.0000.0310.0160.0000.0000.0000.0850.0000.0000.0000.0400.012
DistanceFromHome-0.019-0.0031.0000.0200.0030.040-0.0100.030-0.003-0.0250.0110.014-0.0050.004-0.0110.0670.0230.0000.0000.0000.0000.0300.0280.0540.0000.0000.0000.0660.0580.0250.0150.000
HourlyRate0.0290.0240.0201.000-0.020-0.0150.019-0.010-0.0120.000-0.029-0.034-0.052-0.0140.0290.0440.0000.0000.0000.0310.0000.0000.0000.0000.0230.0100.0000.0640.0000.0000.0520.000
MonthlyIncome0.4720.0160.003-0.0201.0000.0540.190-0.0340.710-0.0350.4640.3950.2650.365-0.4640.2170.0250.1870.0940.0730.0000.0460.0460.8640.4230.0000.0610.0000.0000.0430.0560.000
MonthlyRate0.017-0.0320.040-0.0150.0541.0000.020-0.0050.013-0.010-0.030-0.007-0.016-0.0350.0300.0100.0000.0000.0370.0000.0000.0000.0000.0160.0000.0480.0000.0000.0150.0550.0000.034
NumCompaniesWorked0.3530.037-0.0100.0190.1900.0201.0000.0000.315-0.047-0.171-0.128-0.067-0.1440.1710.1070.0000.0320.1010.0600.0000.0000.0000.1130.0790.0000.0380.0000.0000.0000.0000.051
PercentSalaryHike0.0080.0250.030-0.010-0.034-0.0050.0001.000-0.026-0.004-0.054-0.026-0.055-0.0260.0540.0000.0300.0000.0210.0000.0000.0490.0360.0000.0000.0000.0000.0000.9970.0270.0000.000
TotalWorkingYears0.6570.021-0.003-0.0120.7100.0130.315-0.0261.000-0.0140.5940.4930.3350.495-0.5940.2080.0000.0240.0950.0300.0000.0000.0000.5390.2930.0240.0690.0000.0000.0310.0640.000
TrainingTimesLastYear0.000-0.011-0.0250.000-0.035-0.010-0.047-0.004-0.0141.0000.0010.0050.010-0.012-0.0010.0790.0000.0000.0270.0440.0000.0000.0130.0170.0000.0210.0000.0990.0000.0000.0000.000
YearsAtCompany0.252-0.0100.011-0.0290.464-0.030-0.171-0.0540.5940.0011.0000.8540.5200.843-1.0000.1730.0000.0000.0710.0000.0310.0660.0530.3530.1880.0000.0000.0180.0000.0000.0120.020
YearsInCurrentRole0.1980.0070.014-0.0340.395-0.007-0.128-0.0260.4930.0050.8541.0000.5060.725-0.8540.1690.0000.0000.0290.0000.0360.0790.0000.2410.1320.0000.0400.0420.0310.0000.0230.025
YearsSinceLastPromotion0.174-0.038-0.005-0.0520.265-0.016-0.067-0.0550.3350.0100.5200.5061.0000.467-0.5200.0270.0300.0000.0000.0000.0000.0000.0000.2060.1110.0000.0350.0110.0000.0500.0560.000
YearsWithCurrManager0.195-0.0050.004-0.0140.365-0.035-0.144-0.0260.495-0.0120.8430.7250.4671.000-0.8430.1790.0640.0000.0000.0000.0000.0000.0440.2320.1180.0000.0000.0000.0300.0000.0300.031
Fecha-0.2520.010-0.0110.029-0.4640.0300.1710.054-0.594-0.001-1.000-0.854-0.520-0.8431.0000.1750.0000.0420.0650.0000.0170.0330.0520.3430.1850.0000.0590.0440.0000.0000.0000.000
Attrition0.2130.0620.0670.0440.2170.0100.1070.0000.2080.0790.1730.1690.0270.1790.1751.0000.1230.0770.0000.0870.1150.0090.1320.2160.2310.0990.1730.2430.0000.0390.1980.095
BusinessTravel0.0410.0290.0230.0000.0250.0000.0000.0300.0000.0000.0000.0000.0300.0640.0000.1231.0000.0000.0000.0000.0000.0370.0160.0000.0000.0000.0350.0240.0000.0000.0000.000
Department0.0000.0000.0000.0000.1870.0000.0320.0000.0240.0000.0000.0000.0000.0000.0420.0770.0001.0000.0000.5880.0180.0260.0000.2120.9370.0290.0300.0000.0000.0200.0000.047
Education0.1530.0170.0000.0000.0940.0370.1010.0210.0950.0270.0710.0290.0000.0000.0650.0000.0000.0001.0000.0550.0190.0000.0000.0880.0510.0150.0000.0010.0000.0160.0270.000
EducationField0.0000.0390.0000.0310.0730.0000.0600.0000.0300.0440.0000.0000.0000.0000.0000.0870.0000.5880.0551.0000.0310.0000.0000.0910.3360.0170.0000.0000.0000.0400.0320.027
EnvironmentSatisfaction0.0060.0000.0000.0000.0000.0000.0000.0000.0000.0000.0310.0360.0000.0000.0170.1150.0000.0180.0190.0311.0000.0000.0340.0000.0000.0000.0190.0600.0000.0000.0000.000
Gender0.0000.0310.0300.0000.0460.0000.0000.0490.0000.0000.0660.0790.0000.0000.0330.0090.0370.0260.0000.0000.0001.0000.0000.0480.0740.0000.0320.0310.0000.0000.0000.000
JobInvolvement0.0250.0160.0280.0000.0460.0000.0000.0360.0000.0130.0530.0000.0000.0440.0520.1320.0160.0000.0000.0000.0340.0001.0000.0000.0000.0000.0240.0000.0000.0000.0220.000
JobLevel0.2950.0000.0540.0000.8640.0160.1130.0000.5390.0170.3530.2410.2060.2320.3430.2160.0000.2120.0880.0910.0000.0480.0001.0000.5690.0000.0460.0000.0000.0000.0690.000
JobRole0.1750.0000.0000.0230.4230.0000.0790.0000.2930.0000.1880.1320.1110.1180.1850.2310.0000.9370.0510.3360.0000.0740.0000.5691.0000.0000.0610.0000.0000.0300.0390.029
JobSatisfaction0.0000.0000.0000.0100.0000.0480.0000.0000.0240.0210.0000.0000.0000.0000.0000.0990.0000.0290.0150.0170.0000.0000.0000.0000.0001.0000.0000.0220.0260.0000.0000.000
MaritalStatus0.1410.0850.0000.0000.0610.0000.0380.0000.0690.0000.0000.0400.0350.0000.0590.1730.0350.0300.0000.0000.0190.0320.0240.0460.0610.0001.0000.0000.0000.0250.5810.000
OverTime0.0000.0000.0660.0640.0000.0000.0000.0000.0000.0990.0180.0420.0110.0000.0440.2430.0240.0000.0010.0000.0600.0310.0000.0000.0000.0220.0001.0000.0000.0250.0000.000
PerformanceRating0.0000.0000.0580.0000.0000.0150.0000.9970.0000.0000.0000.0310.0000.0300.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0260.0000.0001.0000.0000.0000.000
RelationshipSatisfaction0.0350.0000.0250.0000.0430.0550.0000.0270.0310.0000.0000.0000.0500.0000.0000.0390.0000.0200.0160.0400.0000.0000.0000.0000.0300.0000.0250.0250.0001.0000.0300.000
StockOptionLevel0.0930.0400.0150.0520.0560.0000.0000.0000.0640.0000.0120.0230.0560.0300.0000.1980.0000.0000.0270.0320.0000.0000.0220.0690.0390.0000.5810.0000.0000.0301.0000.019
WorkLifeBalance0.0330.0120.0000.0000.0000.0340.0510.0000.0000.0000.0200.0250.0000.0310.0000.0950.0000.0470.0000.0270.0000.0000.0000.0000.0290.0000.0000.0000.0000.0000.0191.000

Missing values

  • Count
  • Matrix
2023-07-14T01:12:35.685903image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-07-14T01:12:36.614043image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

  • First rows
  • Last rows
AgeAttritionBusinessTravelDailyRateDepartmentDistanceFromHomeEducationEducationFieldEnvironmentSatisfactionGenderHourlyRateJobInvolvementJobLevelJobRoleJobSatisfactionMaritalStatusMonthlyIncomeMonthlyRateNumCompaniesWorkedOverTimePercentSalaryHikePerformanceRatingRelationshipSatisfactionStockOptionLevelTotalWorkingYearsTrainingTimesLastYearWorkLifeBalanceYearsAtCompanyYearsInCurrentRoleYearsSinceLastPromotionYearsWithCurrManagerFecha
041YesTravel_Rarely1102Sales12Life Sciences2Female9432Sales Executive4Single5993194798Yes1131080164052012
149NoTravel_Frequently279Research & Development81Life Sciences3Male6122Research Scientist2Married5130249071No234411033107172008
237YesTravel_Rarely1373Research & Development22Other4Male9221Laboratory Technician3Single209023966Yes1532073300002018
333NoTravel_Frequently1392Research & Development34Life Sciences4Female5631Research Scientist3Married2909231591Yes1133083387302010
427NoTravel_Rarely591Research & Development21Medical1Male4031Laboratory Technician2Married3468166329No1234163322222016
532NoTravel_Frequently1005Research & Development22Life Sciences4Male7931Laboratory Technician4Single3068118640No1333082277362011
659NoTravel_Rarely1324Research & Development33Medical3Female8141Laboratory Technician1Married267099644Yes20413123210002017
730NoTravel_Rarely1358Research & Development241Life Sciences4Male6731Laboratory Technician3Divorced2693133351No2242112310002017
838NoTravel_Frequently216Research & Development233Life Sciences4Male4423Manufacturing Director3Single952687870No21420102397182009
936NoTravel_Rarely1299Research & Development273Medical3Male9432Healthcare Representative3Married5237165776No13322173277772011
AgeAttritionBusinessTravelDailyRateDepartmentDistanceFromHomeEducationEducationFieldEnvironmentSatisfactionGenderHourlyRateJobInvolvementJobLevelJobRoleJobSatisfactionMaritalStatusMonthlyIncomeMonthlyRateNumCompaniesWorkedOverTimePercentSalaryHikePerformanceRatingRelationshipSatisfactionStockOptionLevelTotalWorkingYearsTrainingTimesLastYearWorkLifeBalanceYearsAtCompanyYearsInCurrentRoleYearsSinceLastPromotionYearsWithCurrManagerFecha
146029NoTravel_Rarely468Research & Development284Medical4Female7321Research Scientist1Single378584891No1432053154042013
146150YesTravel_Rarely410Sales283Marketing4Male3923Sales Executive1Divorced10854165864Yes13321203332202015
146239NoTravel_Rarely722Sales241Marketing2Female6024Sales Executive4Married1203188280No113112122209961998
146331NoNon-Travel325Research & Development53Medical2Male7432Manufacturing Director1Single993637870No19320102394172009
146426NoTravel_Rarely1167Sales53Other4Female3021Sales Representative3Single2966213780No1834052342002014
146536NoTravel_Frequently884Research & Development232Medical3Male4142Laboratory Technician4Married2571122904No17331173352032013
146639NoTravel_Rarely613Research & Development61Medical4Male4223Healthcare Representative1Married9991214574No1531195377172011
146727NoTravel_Rarely155Research & Development43Life Sciences2Male8742Manufacturing Director2Married614251741Yes2042160362032012
146849NoTravel_Frequently1023Sales23Medical4Male6322Sales Executive2Married5390132432No14340173296082009
146934NoTravel_Rarely628Research & Development83Medical2Male8242Laboratory Technician3Married4404102282No1231063443122014

Report generated by YData.

In [ ]:
from sklearn.model_selection import train_test_split

# Separar características (X) y etiquetas (y)
X = df.drop('Attrition', axis=1)
y = df['Attrition']

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Mostrar las formas de los conjuntos de datos
print("Forma de X_train:", X_train.shape)
print("Forma de X_test:", X_test.shape)
print("Forma de y_train:", y_train.shape)
print("Forma de y_test:", y_test.shape)
Forma de X_train: (1176, 31)
Forma de X_test: (294, 31)
Forma de y_train: (1176,)
Forma de y_test: (294,)

Basándome en los datos que tenemos,decidimos utilizar un modelo de clasificación, ya que la variable objetivo (y_train y y_test) es una etiqueta categórica que indica si un tumor es benigno o maligno.

Un modelo de clasificación que ha demostrado ser efectivo en este tipo de problemas es la regresión logística. Es un algoritmo simple y fácil de interpretar, pero también puede proporcionar buenos resultados en muchos casos.

Entonces, procederemos a entrenar un modelo de regresión logística utilizando los datos de entrenamiento (X_train y y_train) y luego evaluaremos su rendimiento utilizando los datos de prueba (X_test y y_test).

En este caso, utilizaremos un algoritmo de clasificación como la regresión logística, que es una opción común y eficiente para problemas de clasificación binaria. La regresión logística es un modelo lineal que se utiliza para predecir la probabilidad de pertenecer a una clase determinada.

A continuación,mostramos cómo entrenar y evaluar un modelo de regresión logística utilizando los conjuntos de datos que hemos preparado. Utilizaremos la biblioteca scikit-learn, que es una biblioteca de aprendizaje automático en Python.

In [ ]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import OneHotEncoder
import numpy as np


# Combinar los datos de entrenamiento y prueba
X_combined = np.vstack((X_train, X_test))

# Codificar variables categóricas usando OneHotEncoder
encoder = OneHotEncoder()
X_combined_encoded = encoder.fit_transform(X_combined)

# Dividir los datos codificados de nuevo en entrenamiento y prueba
X_train_encoded = X_combined_encoded[:X_train.shape[0]]
X_test_encoded = X_combined_encoded[X_train.shape[0]:]

# Crear una instancia del modelo de regresión logística
model = LogisticRegression()

# Entrenar el modelo con los datos de entrenamiento codificados
model.fit(X_train_encoded, y_train)

# Realizar predicciones en los datos de prueba codificados
y_pred = model.predict(X_test_encoded)

# Calcular la precisión del modelo
accuracy = accuracy_score(y_test, y_pred)

# Imprimir la precisión del modelo
print("Precisión del modelo: {:.2f}%".format(accuracy * 100))
Precisión del modelo: 87.41%

Evaluar el rendimiento del modelo utilizando métricas adicionales es una práctica recomendada para obtener una comprensión más completa de su desempeño. A continuación, mostraremos cómo calcular algunas métricas comunes de evaluación de modelos de clasificación y también daremos un ejemplo de otro enfoque de modelado utilizando un clasificador de Bosques Aleatorios.

Métricas adicionales: Matriz de confusión: proporcionara una descripción detallada del rendimiento del modelo al mostrar la cantidad de verdaderos positivos, falsos positivos, verdaderos negativos y falsos negativos. Precisión: medira la proporción de predicciones positivas correctas con respecto al total de predicciones positivas. Recall (sensibilidad): medira la proporción de verdaderos positivos detectados con respecto al total de casos positivos. F1-score: es una medida de equilibrio entre la precisión y el recall. Calcular estas métricas utilizando el modelo de regresion logistica

  1. Evaluación del modelo: Evalúaremos el rendimiento del modelo utilizando métricas adecuadas y realizar ajustes si es necesario.
In [ ]:
from sklearn.metrics import confusion_matrix


# Crear la matriz de confusión
confusion = confusion_matrix(y_test, y_pred)

# Imprimir la matriz de confusión
print("Matriz de Confusión:")
print(confusion)
Matriz de Confusión:
[[241  14]
 [ 23  16]]

Según la matriz de confusión proporcionada, podemos realizar la siguiente evaluación del modelo:

Verdaderos positivos (True Positives, TP): 241 Verdaderos negativos (True Negatives, TN): 16 Falsos positivos (False Positives, FP): 14 Falsos negativos (False Negatives, FN): 23 Basándonos en estos valores, podemos calcular algunas métricas de evaluación del modelo:

Precisión (Precision): Mide la proporción de predicciones positivas que son correctas. Precisión = TP / (TP + FP) = 241 / (241 + 14) = 0.945

Exhaustividad o Sensibilidad (Recall o Sensitivity): Mide la proporción de casos positivos que se han identificado correctamente. Exhaustividad = TP / (TP + FN) = 241 / (241 + 23) = 0.909

Puntuación F1 (F1 Score): Es una medida que combina la precisión y la exhaustividad en un solo valor, proporcionando un equilibrio entre ambas. Puntuación F1 = 2 * (Precisión * Exhaustividad) / (Precisión + Exhaustividad) = 2 * (0.945 * 0.909) / (0.945 + 0.909) = 0.9277

Exactitud (Accuracy): Mide la proporción de predicciones correctas en general. Exactitud = (TP + TN) / (TP + TN + FP + FN) = (241 + 16) / (241 + 16 + 14 + 23) = 0.872

Basándonos en estos resultados, el modelo muestra una precisión del 94.5%, lo que significa que el 94.5% de las predicciones positivas son correctas. La exhaustividad es del 90.9%, lo que indica que el modelo logra identificar correctamente el 90.9% de los casos positivos. La puntuación F1 es de 0.926, que proporciona un equilibrio entre precisión y exhaustividad. La exactitud del modelo es del 87.2%, lo que indica que el 87.2% de todas las predicciones son correctas.

En general, el modelo parece tener un rendimiento aceptable, pero sería recomendable evaluarlo en conjunto con otras métricas y realizar comparaciones con otros modelos antes de sacar conclusiones definitivas.

Precisión (Precision): Ya mencionada anteriormente, mide la proporción de predicciones positivas que son correctas.

In [ ]:
from sklearn.metrics import precision_score
cols = ["Over18","EmployeeCount","EmployeeNumber","StandardHours"]

df.drop(columns=cols, inplace=True)

# Calcular la precisión del modelo
precision = precision_score(y_test, y_pred, pos_label='Yes')

# Imprimir la precisión del modelo
print("Precisión del modelo: {:.2f}%".format(precision * 100))
Precisión del modelo: 53.33%
In [ ]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Cargar los datos
data = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')
cols = ["Over18","EmployeeCount","EmployeeNumber","StandardHours"]

df.drop(columns=cols, inplace=True)

# Preprocesar los datos
le = LabelEncoder()
data['Attrition'] = le.fit_transform(data['Attrition'])
data = pd.get_dummies(data)

# Dividir los datos en conjunto de entrenamiento y prueba
X = data.drop('Attrition', axis=1)
y = data['Attrition']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Entrenar el modelo de Random Forest
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Hacer predicciones en el conjunto de prueba
y_pred = model.predict(X_test)

# Calcular las métricas de evaluación
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Imprimir las métricas
print('Accuracy:', accuracy)
print('Precision:', precision)
print('Recall:', recall)
print('F1 Score:', f1)
Accuracy: 0.8775510204081632
Precision: 0.7142857142857143
Recall: 0.1282051282051282
F1 Score: 0.21739130434782608

En general, el modelo muestra una precisión aceptable, lo que significa que es relativamente preciso cuando predice un resultado positivo. Sin embargo, tiene un bajo índice de recuperación (recall), lo que indica que tiene dificultades para identificar correctamente las instancias positivas. Esto se refleja en un puntaje F1 moderado, que equilibra la precisión y el recall. Aunque la precisión es alta, la baja recuperación y el puntaje F1 indican que hay margen para mejorar el rendimiento del modelo. En resumen, el modelo tiene un buen nivel de precisión general, pero es menos efectivo para identificar correctamente las instancias positivas.

In [ ]:
import pandas as pd

# Cargar el archivo CSV en un dataframe
data = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')

# Convertir la columna 'Attrition' en valores numéricos (1 para 'Yes' y 0 para 'No')
df['Attrition'] = df['Attrition'].apply(lambda x: 1 if x == 'Yes' else 0)

# Verificar los primeros registros de la columna 'Attrition' convertida
print(df['Attrition'].head())
0    0
1    0
2    0
3    0
4    0
Name: Attrition, dtype: int64
In [ ]:
df.head().T
Out[ ]:
0 1 2 3 4
Age 41 49 37 33 27
Attrition 0 0 0 0 0
BusinessTravel Travel_Rarely Travel_Frequently Travel_Rarely Travel_Frequently Travel_Rarely
DailyRate 1102 279 1373 1392 591
Department Sales Research & Development Research & Development Research & Development Research & Development
DistanceFromHome 1 8 2 3 2
Education 2 1 2 4 1
EducationField Life Sciences Life Sciences Other Life Sciences Medical
EmployeeCount 1 1 1 1 1
EmployeeNumber 1 2 4 5 7
EnvironmentSatisfaction 2 3 4 4 1
Gender Female Male Male Female Male
HourlyRate 94 61 92 56 40
JobInvolvement 3 2 2 3 3
JobLevel 2 2 1 1 1
JobRole Sales Executive Research Scientist Laboratory Technician Research Scientist Laboratory Technician
JobSatisfaction 4 2 3 3 2
MaritalStatus Single Married Single Married Married
MonthlyIncome 5993 5130 2090 2909 3468
MonthlyRate 19479 24907 2396 23159 16632
NumCompaniesWorked 8 1 6 1 9
Over18 Y Y Y Y Y
OverTime Yes No Yes Yes No
PercentSalaryHike 11 23 15 11 12
PerformanceRating 3 4 3 3 3
RelationshipSatisfaction 1 4 2 3 4
StandardHours 80 80 80 80 80
StockOptionLevel 0 1 0 0 1
TotalWorkingYears 8 10 7 8 6
TrainingTimesLastYear 0 3 3 3 3
WorkLifeBalance 1 3 3 3 3
YearsAtCompany 6 10 0 8 2
YearsInCurrentRole 4 7 0 7 2
YearsSinceLastPromotion 0 1 0 3 2
YearsWithCurrManager 5 7 0 0 2
Fecha 2012 2008 2018 2010 2016
In [ ]:
import pandas as pd

data = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')

# Convertir la columna 'Attrition' en valores binarios (1 para 'Sí' y 0 para 'No')
df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})

# Calcular la matriz de correlación utilizando el método Pearson
corr_p = df.corr()

# Calcular la matriz de correlación utilizando el método Kendall
corr_k = df.corr(method='kendall')

# Calcular la matriz de correlación utilizando el método Spearman
corr_s = df.corr(method='spearman')

# Imprimir la matriz de correlación con el método Spearman
print(corr_s)
                               Age  Attrition  DailyRate  DistanceFromHome  \
Age                       1.000000        NaN   0.007290         -0.019291   
Attrition                      NaN        NaN        NaN               NaN   
DailyRate                 0.007290        NaN   1.000000         -0.002754   
DistanceFromHome         -0.019291        NaN  -0.002754          1.000000   
Education                 0.204937        NaN  -0.013607          0.015708   
EmployeeCount                  NaN        NaN        NaN               NaN   
EmployeeNumber           -0.001770        NaN  -0.051800          0.038906   
EnvironmentSatisfaction   0.009820        NaN   0.018961         -0.010401   
HourlyRate                0.028858        NaN   0.023511          0.020446   
JobInvolvement            0.034456        NaN   0.042469          0.034430   
JobLevel                  0.489618        NaN   0.003816          0.022148   
JobSatisfaction          -0.005185        NaN   0.027829         -0.013078   
MonthlyIncome             0.471902        NaN   0.016260          0.002512   
MonthlyRate               0.017451        NaN  -0.032360          0.039618   
NumCompaniesWorked        0.353213        NaN   0.036548         -0.009592   
PercentSalaryHike         0.007709        NaN   0.025070          0.029666   
PerformanceRating         0.000093        NaN   0.000624          0.011320   
RelationshipSatisfaction  0.046063        NaN   0.009685          0.005852   
StandardHours                  NaN        NaN        NaN               NaN   
StockOptionLevel          0.056633        NaN   0.038514          0.030190   
TotalWorkingYears         0.656896        NaN   0.020951         -0.002912   
TrainingTimesLastYear     0.000316        NaN  -0.011339         -0.024848   
WorkLifeBalance          -0.003707        NaN  -0.040352         -0.020402   
YearsAtCompany            0.251686        NaN  -0.009778          0.010513   
YearsInCurrentRole        0.197978        NaN   0.007208          0.013708   
YearsSinceLastPromotion   0.173647        NaN  -0.037631         -0.004685   
YearsWithCurrManager      0.194818        NaN  -0.004717          0.004448   
Fecha                    -0.251686        NaN   0.009778         -0.010513   

                          Education  EmployeeCount  EmployeeNumber  \
Age                        0.204937            NaN       -0.001770   
Attrition                       NaN            NaN             NaN   
DailyRate                 -0.013607            NaN       -0.051800   
DistanceFromHome           0.015708            NaN        0.038906   
Education                  1.000000            NaN        0.042815   
EmployeeCount                   NaN            NaN             NaN   
EmployeeNumber             0.042815            NaN        1.000000   
EnvironmentSatisfaction   -0.027625            NaN        0.021750   
HourlyRate                 0.014432            NaN        0.034717   
JobInvolvement             0.037231            NaN       -0.002453   
JobLevel                   0.107419            NaN       -0.011057   
JobSatisfaction           -0.005175            NaN       -0.047150   
MonthlyIncome              0.120028            NaN        0.001797   
MonthlyRate               -0.021214            NaN        0.011933   
NumCompaniesWorked         0.135103            NaN        0.007011   
PercentSalaryHike          0.004300            NaN       -0.008179   
PerformanceRating         -0.025081            NaN       -0.020675   
RelationshipSatisfaction  -0.013173            NaN       -0.072991   
StandardHours                   NaN            NaN             NaN   
StockOptionLevel           0.013794            NaN        0.059480   
TotalWorkingYears          0.162177            NaN       -0.003748   
TrainingTimesLastYear     -0.023749            NaN        0.026502   
WorkLifeBalance            0.017350            NaN        0.009994   
YearsAtCompany             0.064196            NaN        0.013205   
YearsInCurrentRole         0.054567            NaN       -0.001079   
YearsSinceLastPromotion    0.032203            NaN        0.007857   
YearsWithCurrManager       0.051292            NaN       -0.005138   
Fecha                     -0.064196            NaN       -0.013205   

                          EnvironmentSatisfaction  HourlyRate  JobInvolvement  \
Age                                      0.009820    0.028858        0.034456   
Attrition                                     NaN         NaN             NaN   
DailyRate                                0.018961    0.023511        0.042469   
DistanceFromHome                        -0.010401    0.020446        0.034430   
Education                               -0.027625    0.014432        0.037231   
EmployeeCount                                 NaN         NaN             NaN   
EmployeeNumber                           0.021750    0.034717       -0.002453   
EnvironmentSatisfaction                  1.000000   -0.052380       -0.015301   
HourlyRate                              -0.052380    1.000000        0.043884   
JobInvolvement                          -0.015301    0.043884        1.000000   
JobLevel                                -0.000192   -0.033876       -0.018424   
JobSatisfaction                         -0.002993   -0.068340       -0.012148   
MonthlyIncome                           -0.015163   -0.019762       -0.024552   
MonthlyRate                              0.037477   -0.014888       -0.018117   
NumCompaniesWorked                       0.006151    0.019209        0.015448   
PercentSalaryHike                       -0.030489   -0.009876       -0.016999   
PerformanceRating                       -0.029160   -0.002185       -0.024733   
RelationshipSatisfaction                 0.005353    0.000259        0.037857   
StandardHours                                 NaN         NaN             NaN   
StockOptionLevel                         0.009826    0.050543        0.034464   
TotalWorkingYears                       -0.013882   -0.012072        0.006444   
TrainingTimesLastYear                   -0.011659    0.000292        0.002014   
WorkLifeBalance                          0.027169   -0.010003       -0.019889   
YearsAtCompany                           0.008425   -0.029032        0.013836   
YearsInCurrentRole                       0.020140   -0.034016        0.015548   
YearsSinceLastPromotion                  0.026082   -0.052412       -0.008307   
YearsWithCurrManager                    -0.001732   -0.013811        0.037397   
Fecha                                   -0.008425    0.029032       -0.013836   

                          ...  StandardHours  StockOptionLevel  \
Age                       ...            NaN          0.056633   
Attrition                 ...            NaN               NaN   
DailyRate                 ...            NaN          0.038514   
DistanceFromHome          ...            NaN          0.030190   
Education                 ...            NaN          0.013794   
EmployeeCount             ...            NaN               NaN   
EmployeeNumber            ...            NaN          0.059480   
EnvironmentSatisfaction   ...            NaN          0.009826   
HourlyRate                ...            NaN          0.050543   
JobInvolvement            ...            NaN          0.034464   
JobLevel                  ...            NaN          0.047786   
JobSatisfaction           ...            NaN          0.012785   
MonthlyIncome             ...            NaN          0.045852   
MonthlyRate               ...            NaN         -0.037274   
NumCompaniesWorked        ...            NaN          0.032277   
PercentSalaryHike         ...            NaN          0.023446   
PerformanceRating         ...            NaN          0.011028   
RelationshipSatisfaction  ...            NaN         -0.056249   
StandardHours             ...            NaN               NaN   
StockOptionLevel          ...            NaN          1.000000   
TotalWorkingYears         ...            NaN          0.052618   
TrainingTimesLastYear     ...            NaN          0.003388   
WorkLifeBalance           ...            NaN         -0.016568   
YearsAtCompany            ...            NaN          0.064974   
YearsInCurrentRole        ...            NaN          0.071627   
YearsSinceLastPromotion   ...            NaN          0.027502   
YearsWithCurrManager      ...            NaN          0.053646   
Fecha                     ...            NaN         -0.064974   

                          TotalWorkingYears  TrainingTimesLastYear  \
Age                                0.656896               0.000316   
Attrition                               NaN                    NaN   
DailyRate                          0.020951              -0.011339   
DistanceFromHome                  -0.002912              -0.024848   
Education                          0.162177              -0.023749   
EmployeeCount                           NaN                    NaN   
EmployeeNumber                    -0.003748               0.026502   
EnvironmentSatisfaction           -0.013882              -0.011659   
HourlyRate                        -0.012072               0.000292   
JobInvolvement                     0.006444               0.002014   
JobLevel                           0.734678              -0.019729   
JobSatisfaction                   -0.015875              -0.011681   
MonthlyIncome                      0.710024              -0.034847   
MonthlyRate                        0.013360              -0.010018   
NumCompaniesWorked                 0.315196              -0.047336   
PercentSalaryHike                 -0.025528              -0.004106   
PerformanceRating                  0.011678              -0.016676   
RelationshipSatisfaction           0.003971               0.005424   
StandardHours                           NaN                    NaN   
StockOptionLevel                   0.052618               0.003388   
TotalWorkingYears                  1.000000              -0.014151   
TrainingTimesLastYear             -0.014151               1.000000   
WorkLifeBalance                    0.003004               0.023690   
YearsAtCompany                     0.594193               0.001389   
YearsInCurrentRole                 0.492721               0.004581   
YearsSinceLastPromotion            0.334996               0.010215   
YearsWithCurrManager               0.495254              -0.011628   
Fecha                             -0.594193              -0.001389   

                          WorkLifeBalance  YearsAtCompany  YearsInCurrentRole  \
Age                             -0.003707        0.251686            0.197978   
Attrition                             NaN             NaN                 NaN   
DailyRate                       -0.040352       -0.009778            0.007208   
DistanceFromHome                -0.020402        0.010513            0.013708   
Education                        0.017350        0.064196            0.054567   
EmployeeCount                         NaN             NaN                 NaN   
EmployeeNumber                   0.009994        0.013205           -0.001079   
EnvironmentSatisfaction          0.027169        0.008425            0.020140   
HourlyRate                      -0.010003       -0.029032           -0.034016   
JobInvolvement                  -0.019889        0.013836            0.015548   
JobLevel                         0.040466        0.472283            0.391085   
JobSatisfaction                 -0.029781        0.012280            0.000531   
MonthlyIncome                    0.030759        0.464315            0.394712   
MonthlyRate                      0.006316       -0.029862           -0.006865   
NumCompaniesWorked               0.009103       -0.171070           -0.127673   
PercentSalaryHike                0.000930       -0.054117           -0.025528   
PerformanceRating                0.006808        0.017224            0.032719   
RelationshipSatisfaction         0.017684       -0.001267           -0.021400   
StandardHours                         NaN             NaN                 NaN   
StockOptionLevel                -0.016568        0.064974            0.071627   
TotalWorkingYears                0.003004        0.594193            0.492721   
TrainingTimesLastYear            0.023690        0.001389            0.004581   
WorkLifeBalance                  1.000000        0.004675            0.023214   
YearsAtCompany                   0.004675        1.000000            0.854000   
YearsInCurrentRole               0.023214        0.854000            1.000000   
YearsSinceLastPromotion          0.002151        0.519966            0.505657   
YearsWithCurrManager            -0.004591        0.842803            0.724754   
Fecha                           -0.004675       -1.000000           -0.854000   

                          YearsSinceLastPromotion  YearsWithCurrManager  \
Age                                      0.173647              0.194818   
Attrition                                     NaN                   NaN   
DailyRate                               -0.037631             -0.004717   
DistanceFromHome                        -0.004685              0.004448   
Education                                0.032203              0.051292   
EmployeeCount                                 NaN                   NaN   
EmployeeNumber                           0.007857             -0.005138   
EnvironmentSatisfaction                  0.026082             -0.001732   
HourlyRate                              -0.052412             -0.013811   
JobInvolvement                          -0.008307              0.037397   
JobLevel                                 0.269096              0.370889   
JobSatisfaction                          0.007497             -0.016772   
MonthlyIncome                            0.264599              0.365386   
MonthlyRate                             -0.016285             -0.035059   
NumCompaniesWorked                      -0.066950             -0.144129   
PercentSalaryHike                       -0.055362             -0.026049   
PerformanceRating                       -0.006578              0.025560   
RelationshipSatisfaction                 0.036963              0.000280   
StandardHours                                 NaN                   NaN   
StockOptionLevel                         0.027502              0.053646   
TotalWorkingYears                        0.334996              0.495254   
TrainingTimesLastYear                    0.010215             -0.011628   
WorkLifeBalance                          0.002151             -0.004591   
YearsAtCompany                           0.519966              0.842803   
YearsInCurrentRole                       0.505657              0.724754   
YearsSinceLastPromotion                  1.000000              0.466713   
YearsWithCurrManager                     0.466713              1.000000   
Fecha                                   -0.519966             -0.842803   

                             Fecha  
Age                      -0.251686  
Attrition                      NaN  
DailyRate                 0.009778  
DistanceFromHome         -0.010513  
Education                -0.064196  
EmployeeCount                  NaN  
EmployeeNumber           -0.013205  
EnvironmentSatisfaction  -0.008425  
HourlyRate                0.029032  
JobInvolvement           -0.013836  
JobLevel                 -0.472283  
JobSatisfaction          -0.012280  
MonthlyIncome            -0.464315  
MonthlyRate               0.029862  
NumCompaniesWorked        0.171070  
PercentSalaryHike         0.054117  
PerformanceRating        -0.017224  
RelationshipSatisfaction  0.001267  
StandardHours                  NaN  
StockOptionLevel         -0.064974  
TotalWorkingYears        -0.594193  
TrainingTimesLastYear    -0.001389  
WorkLifeBalance          -0.004675  
YearsAtCompany           -1.000000  
YearsInCurrentRole       -0.854000  
YearsSinceLastPromotion  -0.519966  
YearsWithCurrManager     -0.842803  
Fecha                     1.000000  

[28 rows x 28 columns]
In [ ]:
import pandas as pd

df = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')

# Convertir la columna 'Attrition' en valores binarios (1 para 'Sí' y 0 para 'No')
df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})

# Calcular la matriz de correlación utilizando el método Pearson
corr_p = df.corr()

# Obtener las correlaciones absolutas con 'Attrition' y ordenar de forma descendente
corr_abs = corr_p['Attrition'].abs().sort_values(ascending=False)

# Imprimir las correlaciones absolutas
print(corr_abs)
Attrition                   1.000000
TotalWorkingYears           0.171063
JobLevel                    0.169105
YearsInCurrentRole          0.160545
MonthlyIncome               0.159840
Age                         0.159205
YearsWithCurrManager        0.156199
StockOptionLevel            0.137145
YearsAtCompany              0.134392
Fecha                       0.134392
JobInvolvement              0.130016
JobSatisfaction             0.103481
EnvironmentSatisfaction     0.103369
DistanceFromHome            0.077924
WorkLifeBalance             0.063939
TrainingTimesLastYear       0.059478
DailyRate                   0.056652
RelationshipSatisfaction    0.045872
NumCompaniesWorked          0.043494
YearsSinceLastPromotion     0.033019
Education                   0.031373
MonthlyRate                 0.015170
PercentSalaryHike           0.013478
EmployeeNumber              0.010577
HourlyRate                  0.006846
PerformanceRating           0.002889
EmployeeCount                    NaN
StandardHours                    NaN
Name: Attrition, dtype: float64

El resultado que hemos obtenido muestra la correlación de la variable "Attrition" con otras variables. Las correlaciones están ordenadas de forma descendente en función de su magnitud.

La correlación entre "Attrition" y cada variable nos ayuda a entender la relación relativa entre ellas. Aquí encontramos algunos puntos clave a considerar:

Las variables con correlaciones positivas más altas indican una relación directa con la variable "Attrition". Esto significa que a medida que los valores de esas variables aumentan, es más probable que haya atrición.

Las variables con correlaciones negativas indican una relación inversa con "Attrition". A medida que los valores de esas variables aumentan, es menos probable que haya atrición.

Las correlaciones cercanas a cero indican una relación débil o no lineal entre las variables.

Las correlaciones más altas en magnitud, positivas o negativas, indican una relación más fuerte con "Attrition". Por ejemplo, en nuestro caso, las variables "TotalWorkingYears", "JobLevel", "YearsInCurrentRole" y "MonthlyIncome" muestran correlaciones positivas relativamente altas con "Attrition".

Es importante que tengamos en cuenta que la correlación no implica causalidad. Es posible que exista una correlación entre dos variables sin que una sea la causa directa de la otra. Para un análisis más profundo, utilizaremos otras técnicas y explorar el contexto y el conocimiento del dominio.

In [ ]:
import pandas as pd

df = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')
# Convertir la columna 'Attrition' en valores binarios (1 para 'Sí' y 0 para 'No')
df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})
corr_p = df.corr()
corr_k = df.corr(method ='kendall')
corr_s = df.corr(method ='spearman')

corr_s 
Out[ ]:
Age Attrition DailyRate DistanceFromHome Education EmployeeCount EmployeeNumber EnvironmentSatisfaction HourlyRate JobInvolvement ... StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager Fecha
Age 1.000000 -0.171214 0.007290 -0.019291 0.204937 NaN -0.001770 0.009820 0.028858 0.034456 ... NaN 0.056633 0.656896 0.000316 -0.003707 0.251686 0.197978 0.173647 0.194818 -0.251686
Attrition -0.171214 1.000000 -0.056970 0.079248 -0.030346 NaN -0.010369 -0.096486 -0.006692 -0.119496 ... NaN -0.172296 -0.199002 -0.051757 -0.051951 -0.190419 -0.180623 -0.053273 -0.175355 0.190419
DailyRate 0.007290 -0.056970 1.000000 -0.002754 -0.013607 NaN -0.051800 0.018961 0.023511 0.042469 ... NaN 0.038514 0.020951 -0.011339 -0.040352 -0.009778 0.007208 -0.037631 -0.004717 0.009778
DistanceFromHome -0.019291 0.079248 -0.002754 1.000000 0.015708 NaN 0.038906 -0.010401 0.020446 0.034430 ... NaN 0.030190 -0.002912 -0.024848 -0.020402 0.010513 0.013708 -0.004685 0.004448 -0.010513
Education 0.204937 -0.030346 -0.013607 0.015708 1.000000 NaN 0.042815 -0.027625 0.014432 0.037231 ... NaN 0.013794 0.162177 -0.023749 0.017350 0.064196 0.054567 0.032203 0.051292 -0.064196
EmployeeCount NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
EmployeeNumber -0.001770 -0.010369 -0.051800 0.038906 0.042815 NaN 1.000000 0.021750 0.034717 -0.002453 ... NaN 0.059480 -0.003748 0.026502 0.009994 0.013205 -0.001079 0.007857 -0.005138 -0.013205
EnvironmentSatisfaction 0.009820 -0.096486 0.018961 -0.010401 -0.027625 NaN 0.021750 1.000000 -0.052380 -0.015301 ... NaN 0.009826 -0.013882 -0.011659 0.027169 0.008425 0.020140 0.026082 -0.001732 -0.008425
HourlyRate 0.028858 -0.006692 0.023511 0.020446 0.014432 NaN 0.034717 -0.052380 1.000000 0.043884 ... NaN 0.050543 -0.012072 0.000292 -0.010003 -0.029032 -0.034016 -0.052412 -0.013811 0.029032
JobInvolvement 0.034456 -0.119496 0.042469 0.034430 0.037231 NaN -0.002453 -0.015301 0.043884 1.000000 ... NaN 0.034464 0.006444 0.002014 -0.019889 0.013836 0.015548 -0.008307 0.037397 -0.013836
JobLevel 0.489618 -0.190370 0.003816 0.022148 0.107419 NaN -0.011057 -0.000192 -0.033876 -0.018424 ... NaN 0.047786 0.734678 -0.019729 0.040466 0.472283 0.391085 0.269096 0.370889 -0.472283
JobSatisfaction -0.005185 -0.102948 0.027829 -0.013078 -0.005175 NaN -0.047150 -0.002993 -0.068340 -0.012148 ... NaN 0.012785 -0.015875 -0.011681 -0.029781 0.012280 0.000531 0.007497 -0.016772 -0.012280
MonthlyIncome 0.471902 -0.198305 0.016260 0.002512 0.120028 NaN 0.001797 -0.015163 -0.019762 -0.024552 ... NaN 0.045852 0.710024 -0.034847 0.030759 0.464315 0.394712 0.264599 0.365386 -0.464315
MonthlyRate 0.017451 0.015258 -0.032360 0.039618 -0.021214 NaN 0.011933 0.037477 -0.014888 -0.018117 ... NaN -0.037274 0.013360 -0.010018 0.006316 -0.029862 -0.006865 -0.016285 -0.035059 0.029862
NumCompaniesWorked 0.353213 0.030505 0.036548 -0.009592 0.135103 NaN 0.007011 0.006151 0.019209 0.015448 ... NaN 0.032277 0.315196 -0.047336 0.009103 -0.171070 -0.127673 -0.066950 -0.144129 0.171070
PercentSalaryHike 0.007709 -0.023612 0.025070 0.029666 0.004300 NaN -0.008179 -0.030489 -0.009876 -0.016999 ... NaN 0.023446 -0.025528 -0.004106 0.000930 -0.054117 -0.025528 -0.055362 -0.026049 0.054117
PerformanceRating 0.000093 0.002889 0.000624 0.011320 -0.025081 NaN -0.020675 -0.029160 -0.002185 -0.024733 ... NaN 0.011028 0.011678 -0.016676 0.006808 0.017224 0.032719 -0.006578 0.025560 -0.017224
RelationshipSatisfaction 0.046063 -0.042664 0.009685 0.005852 -0.013173 NaN -0.072991 0.005353 0.000259 0.037857 ... NaN -0.056249 0.003971 0.005424 0.017684 -0.001267 -0.021400 0.036963 0.000280 0.001267
StandardHours NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
StockOptionLevel 0.056633 -0.172296 0.038514 0.030190 0.013794 NaN 0.059480 0.009826 0.050543 0.034464 ... NaN 1.000000 0.052618 0.003388 -0.016568 0.064974 0.071627 0.027502 0.053646 -0.064974
TotalWorkingYears 0.656896 -0.199002 0.020951 -0.002912 0.162177 NaN -0.003748 -0.013882 -0.012072 0.006444 ... NaN 0.052618 1.000000 -0.014151 0.003004 0.594193 0.492721 0.334996 0.495254 -0.594193
TrainingTimesLastYear 0.000316 -0.051757 -0.011339 -0.024848 -0.023749 NaN 0.026502 -0.011659 0.000292 0.002014 ... NaN 0.003388 -0.014151 1.000000 0.023690 0.001389 0.004581 0.010215 -0.011628 -0.001389
WorkLifeBalance -0.003707 -0.051951 -0.040352 -0.020402 0.017350 NaN 0.009994 0.027169 -0.010003 -0.019889 ... NaN -0.016568 0.003004 0.023690 1.000000 0.004675 0.023214 0.002151 -0.004591 -0.004675
YearsAtCompany 0.251686 -0.190419 -0.009778 0.010513 0.064196 NaN 0.013205 0.008425 -0.029032 0.013836 ... NaN 0.064974 0.594193 0.001389 0.004675 1.000000 0.854000 0.519966 0.842803 -1.000000
YearsInCurrentRole 0.197978 -0.180623 0.007208 0.013708 0.054567 NaN -0.001079 0.020140 -0.034016 0.015548 ... NaN 0.071627 0.492721 0.004581 0.023214 0.854000 1.000000 0.505657 0.724754 -0.854000
YearsSinceLastPromotion 0.173647 -0.053273 -0.037631 -0.004685 0.032203 NaN 0.007857 0.026082 -0.052412 -0.008307 ... NaN 0.027502 0.334996 0.010215 0.002151 0.519966 0.505657 1.000000 0.466713 -0.519966
YearsWithCurrManager 0.194818 -0.175355 -0.004717 0.004448 0.051292 NaN -0.005138 -0.001732 -0.013811 0.037397 ... NaN 0.053646 0.495254 -0.011628 -0.004591 0.842803 0.724754 0.466713 1.000000 -0.842803
Fecha -0.251686 0.190419 0.009778 -0.010513 -0.064196 NaN -0.013205 -0.008425 0.029032 -0.013836 ... NaN -0.064974 -0.594193 -0.001389 -0.004675 -1.000000 -0.854000 -0.519966 -0.842803 1.000000

28 rows × 28 columns

Según los resultados de correlación que hemos obtenido, las variables que podrías considerar como más relevantes para analizar la atrición son aquellas que tienen una correlación positiva relativamente alta con "Attrition". Estas variables indican una relación directa con la probabilidad de atrición. Algunas de las variables con correlaciones positivas más altas nuestro caso son:

TotalWorkingYears: Número total de años trabajados. JobLevel: Nivel del puesto laboral. YearsInCurrentRole: Años en el puesto actual. MonthlyIncome: Ingreso mensual. Age: Edad. YearsWithCurrManager: Años con el actual gerente. StockOptionLevel: Nivel de opciones sobre acciones. YearsAtCompany: Años en la compañía. JobInvolvement: Nivel de implicación en el trabajo. JobSatisfaction: Satisfacción laboral. Estas variables pueden ser consideradas como indicadores clave para analizar la atrición en nuestro conjunto de datos. Podemos profundizar en el análisis de estas variables y examinar su relación con la atrición en mayor detalle. Teniendo siempre en cuenta que la importancia relativa de estas variables puede variar según el contexto y la naturaleza específica de nuestro conjunto de datos.

Además, es importante realizar un análisis más amplio e incorporar otras técnicas de modelado o análisis para obtener una comprensión completa de los factores que influyen en la atrición.

Análisis de la deserción de empleados por género.
¶

In [ ]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')

#Visualization to show Total Employees by Gender.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
gender_attrition = df["Gender"].value_counts()
plt.title("Employees Distribution by Gender",fontweight="black",size=20)
plt.pie(gender_attrition, autopct="%.0f%%",labels=gender_attrition.index,textprops=({"fontweight":"black","size":20}),
        explode=[0,0.1],startangle=90,colors= ["#D4A1E7","brown"])


#Visualization to show Employee Attrition by Gender.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_1 = df["Gender"].value_counts()
value_2 = new_df["Gender"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index, y=value_2.values,palette=["#D4A1E7","brown"])
plt.title("Employee Attrition Rate by Gender",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"% )",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. El número de empleados masculinos en la organización representa una proporción mayor que la de empleadas femeninas en más del 20%.
2. Los empleados masculinos se van más de la organización en comparación con las empleadas femeninas.

Analisis de la Attrition por edad.
¶

In [ ]:
#Visualization to show Employee Distribution by Age.
plt.figure(figsize=(13.5,6))
plt.subplot(1,2,1)
sns.histplot(x="Age",hue="Attrition",data=df,kde=True,palette=["blue","orange"])
plt.title("Employee Distribution by Age",fontweight="black",size=20,pad=10)


#Visualization to show Employee Distribution by Age & Attrition.
plt.subplot(1,2,2)
sns.boxplot(x="Attrition",y="Age",data=df,palette=["orange","blue"])
plt.title("Employee Distribution by Age & Attrition",fontweight="black",size=20,pad=10)
plt.tight_layout()
plt.show()

💬 Conclusion:

  1. La mayoría de los empleados tienen entre 30 y 40 años.
  2. Podemos observar claramente una tendencia de que a medida que aumenta la edad, la deserción disminuye.
  3. Del diagrama de caja también podemos observar que la edad media del empleado que dejó la organización es menor que la de los empleados que están trabajando en la organización.
  4. Los empleados jóvenes abandonan más la empresa en comparación con los empleados mayores.

Analisis Attrition y Business Travel.
¶

In [ ]:
#Visualization to show Total Employees by Businees Travel.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["BusinessTravel"].value_counts()
plt.title("Employees by Business Travel", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['green', 'blue', 'brown'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

#Visualization to show Attrition Rate by Businees Travel.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["BusinessTravel"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index,y=value_2.values,palette=["green","blue","brown"])
plt.title("Attrition Rate by Businees Travel",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"% )",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados de la organización rara vez viajan.
2. Los empleados que viajan con frecuencia pueden observar el mayor desgaste de los empleados.
3. La menor deserción de empleados puede ser observada por aquellos empleados que no viajan.

💬 Solución:

  1. La organización puede dividir los viajes entre los empleados para reducir la carga de los empleados que viajan con frecuencia.
  2. Esto definitivamente ayudará a reducir la tasa de deserción con respecto a los viajes de negocios.

Analisis Attrition por Department.
¶

In [ ]:
#Visualization to show Total Employees by Department.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["Department"].value_counts()
sns.barplot(x=value_1.index, y=value_1.values,palette = ["orange", "red", "blue"])
plt.title("Employees by Department",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_1.values):
    plt.text(index,value,value,ha="center",va="bottom",fontweight="black",size=15,)

#Visualization to show Employee Attrition Rate by Department.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["Department"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index, y=value_2.values,palette=["blue","orange","red"])
plt.title("Attrition Rate by Department",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(attrition_rate[index])+"% )",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados son del Departamento de Investigación y Desarrollo.
2. La deserción más alta está en el Departamento de Ventas.
3. La tasa de deserción del Departamento de Recursos Humanos también es muy alta.
4. Aunque entre los empleados más altos en el departamento de Investigación y Desarrollo, hay menos deserción en comparación con otros departamentos.

Analisis de Attrition por DailyRate.
¶

  • Nota:
    1. DailyRate muestra la tasa de salario diario de los empleados.
    2. Para generar información significativa, podemos dividir Tarifas diarias en tres grupos para un análisis significativo.
In [ ]:
df["DailyRate"].describe().T
Out[ ]:
count    1470.000000
mean      802.485714
std       403.509100
min       102.000000
25%       465.000000
50%       802.000000
75%      1157.000000
max      1499.000000
Name: DailyRate, dtype: float64
In [ ]:
# Define the bin edges for the groups
bin_edges = [0, 500, 1000, 1500]

# Define the labels for the groups
bin_labels = ['Low DailyRate', 'Average DailyRate', 'High DailyRate']

# Cut the DailyRate column into groups
df['DailyRateGroup'] = pd.cut(df['DailyRate'], bins=bin_edges, labels=bin_labels)
In [ ]:
##Visualization to show Total Employees by DailyRateGroup.
plt.figure(figsize=(13,6))
plt.subplot(1,2,1)
value_1 = df["DailyRateGroup"].value_counts()
plt.pie(value_1.values, labels=value_1.index,autopct="%.2f%%",textprops={"fontweight":"black","size":15},
        explode=[0,0.1,0.1],colors= ['#FF8000', 'blue', 'brown', 'orange'])
plt.title("Employees by DailyRateGroup",fontweight="black",pad=15,size=18)


#Visualization to show Attrition Rate by DailyRateGroup.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["DailyRateGroup"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(),y= value_2.values,palette=["#11264e","#6faea4","#FEE08B"])
plt.title("Employee Attrition Rate by DailyRateGroup",fontweight="black",pad=15,size=18)
for index,value in enumerate(value_2.values):
    plt.text(index,value, str(value)+" ("+str(attrition_rate[index])+"%)",ha="center",va="bottom",fontweight="black",size=15)

plt.tight_layout()
plt.show()

💬 Conclusion:

1. Los empleados con tarifa diaria promedio y tarifa diaria alta son aproximadamente iguales.
2. Pero la tasa de deserción es muy alta entre los empleados con Tarifa diaria promedio en comparación con los empleados con Tarifa diaria alta. 3. La tasa de deserción también es alta entre los empleados con un índice diario bajo. 4. La mayoría de los empleados que no obtienen una tarifa diaria alta abandonan la organización.

Analisis de Attrition por Distance From Home.
¶

In [ ]:
print("Total Unique Values in Attribute is =>",df["DistanceFromHome"].nunique())
Total Unique Values in Attribute is => 29
In [ ]:
df["DistanceFromHome"].describe().T
Out[ ]:
count    1470.000000
mean        9.192517
std         8.106864
min         1.000000
25%         2.000000
50%         7.000000
75%        14.000000
max        29.000000
Name: DistanceFromHome, dtype: float64
In [ ]:
# Define the bin edges for the groups
bin_edges = [0,2,5,10,30]

# Define the labels for the groups
bin_labels = ['0-2 kms', '3-5 kms', '6-10 kms',"10+ kms"]

# Cuttinf the DistaanceFromHome column into groups
df['DistanceGroup'] = pd.cut(df['DistanceFromHome'], bins=bin_edges, labels=bin_labels)
In [ ]:
##Visualization to show Total Employees by DistnaceFromHome.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["DistanceGroup"].value_counts()
sns.barplot(x=value_1.index.tolist(), y=value_1.values,palette = ["blue", "orange", "green","#87CEFA"])
plt.title("Employees by Distance From Home",fontweight="black",pad=15,size=18)
for index, value in enumerate(value_1.values):
    plt.text(index,value,value,ha="center",va="bottom",fontweight="black",size=15)
    
#Visualization to show Attrition Rate by DistanceFromHome.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["DistanceGroup"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(),y= value_2.values,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by DistanceFromHome",fontweight="black",pad=15,size=18)
for index,value in enumerate(value_2.values):
    plt.text(index,value, str(value)+" ("+str(attrition_rate[index])+"%)",ha="center",va="bottom",fontweight="black",size=15)

plt.tight_layout()
plt.show()

💬 Conclusion:

1. En la organización hay todo tipo de empleados que se quedan cerca o lejos de la oficina.
2. La función Distancia desde casa no sigue ninguna tendencia en la tasa de deserción.
3. La mayoría de los empleados que permanecen cerca de la organización se van en comparación con los empleados que permanecen lejos de la organización.

Analisis de Attrition por Education.
¶

In [ ]:
#Visualization to show Total Employees by Education.
plt.figure(figsize=(13.5,6))
plt.subplot(1,2,1)
value_1 = df["Education"].value_counts()
sns.barplot(x=value_1.index,y=value_1.values,order=value_1.index,palette = ["#FFA07A", "#D4A1E7", "#FFC0CB","#87CEFA"])
plt.title("Employees Distribution by Education",fontweight="black",size=20,pad=15)
for index,value in enumerate(value_1.values):
    plt.text(index,value,value,ha="center",va="bottom",fontweight="black",size=15)
    
#Visualization to show Employee Attrition by Education.
plt.subplot(1,2,2)
value_2 = new_df["Education"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index,y=value_2.values,order=value_2.index,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Employee Attrition by Education",fontweight="black",size=18,pad=15)
for index,value in enumerate(value_2.values):
    plt.text(index,value,str(value)+" ("+str(attrition_rate[index])+"%)",ha="center",va="bottom",
             fontweight="black",size=13)
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados de la organización han completado una licenciatura o una maestría como calificación educativa.
2. Muy pocos empleados en la organización han completado un doctorado como calificación educativa.
3. Podemos observar una tendencia de disminución en la tasa de deserción a medida que aumenta la calificación educativa.

Analisis de Attrition por Education Field.
¶

In [ ]:
#Visualization to show Total Employees by Education Field.
plt.figure(figsize=(13.5,8))
plt.subplot(1,2,1)
value_1 = df["EducationField"].value_counts()
sns.barplot(x=value_1.index, y=value_1.values,order=value_1.index,palette = ["yellow", "blue", "green","brown"])
plt.title("Employees by Education Field",fontweight="black",size=20,pad=15)
for index,value in enumerate(value_1.values):
    plt.text(index,value,value,ha="center",va="bottom",fontweight="black",size=15)
plt.xticks(rotation=90)

#Visualization to show Employee Attrition by Education Field.
plt.subplot(1,2,2)
value_2 = new_df["EducationField"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index,y=value_2.values,order=value_2.index,palette=["blue","yellow","brown","orange"])
plt.title("Employee Attrition by Education Field",fontweight="black",size=18,pad=15)
for index,value in enumerate(value_2.values):
    plt.text(index,value,str(value)+" ("+str(attrition_rate[index])+"%)",ha="center",va="bottom",
             fontweight="black",size=13)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados pertenecen al campo de las ciencias biológicas o de la educación médica.
2. Muy pocos empleados pertenecen al campo de la educación en recursos humanos.
3. Los campos de educación como recursos humanos, marketing y tecnología tienen una tasa de deserción muy alta.
4. Esto puede deberse a la carga de trabajo porque hay muy pocos empleados en estos campos educativos en comparación con el campo educativo con una tasa de deserción menor.

Analisis de Attrition por Environment Satisfaction.
¶

In [ ]:
#Visualization to show Total Employees by EnvironmentSatisfaction.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["EnvironmentSatisfaction"].value_counts()
plt.title("Employees by EnvironmentSatisfaction", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['#E84040', '#E96060', '#E88181'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by EnvironmentSatisfaction.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["EnvironmentSatisfaction"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index,y=value_2.values,order=value_2.index,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by Environment Satisfaction",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(attrition_rate[index])+"% )",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados han calificado la satisfacción del entorno de la organización como Alta y Muy alta.
2. Aunque la satisfacción del entorno de la organización es alta, todavía hay una gran deserción en este entorno.
3. La tasa de deserción aumenta con el aumento del nivel de satisfacción ambiental.

Analisis de Attrition por Job Roles.
¶

In [ ]:
##Visualization to show Total Employees by JobRole.
plt.figure(figsize=(13,8))
plt.subplot(1,2,1)
value_1 = df["JobRole"].value_counts()
sns.barplot(x=value_1.index.tolist(), y=value_1.values,palette = ["#FFA07A", "#D4A1E7", "#FFC0CB","#87CEFA"])
plt.title("Employees by Job Role",fontweight="black",pad=15,size=18)
plt.xticks(rotation=90)
for index, value in enumerate(value_1.values):
    plt.text(index,value,value,ha="center",va="bottom",fontweight="black",size=15)
    
#Visualization to show Attrition Rate by JobRole.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["JobRole"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(), y=value_2.values,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Employee Attrition Rate by JobRole",fontweight="black",pad=15,size=18)
plt.xticks(rotation=90)
for index,value in enumerate(value_2.values):
    plt.text(index,value, str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             fontweight="black",size=10)

plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados trabajan como ejecutivos de ventas, científicos investigadores o técnicos de laboratorio. en esta organización.
2. Las tasas de deserción más altas se encuentran en el sector de Director de Investigación, Ejecutivo de Ventas, Científico Investigador.

Analisis de Attrition po Job Level.
¶

In [ ]:
#Visualization to show Total Employees by Job Level.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["JobLevel"].value_counts()
plt.title("Employees by Job Level", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.8,startangle=90,
        colors=['#FF6D8C', '#FF8C94', '#FFAC9B', '#FFCBA4',"#FFD8B1"],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by JobLevel.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["JobLevel"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index,y=value_2.values,order=value_2.index,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by Job Level",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(attrition_rate[index])+"% )",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

  1. La mayoría de los empleados de la organización están en el nivel de entrada o en el nivel junior.
  2. La deserción más alta está en el nivel de entrada.
  3. A medida que aumenta el nivel, la tasa de deserción disminuye.

Analisis de Attrition por Job Satisfaction.
¶

In [ ]:
#Visualization to show Total Employees by Job Satisfaction.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["JobSatisfaction"].value_counts()
plt.title("Employees by Job Satisfaction", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.8,startangle=90,
        colors=['#FFB300', '#FFC300', '#FFD700', '#FFFF00'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by Job Satisfaction.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["JobSatisfaction"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index,y=value_2.values,order=value_2.index,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by Job Satisfaction",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(attrition_rate[index])+"% )",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados han calificado su satisfacción laboral como alta o muy alta.
2. La mayoría de los empleados que calificaron su satisfacción laboral como baja abandonan la organización.
3. Todas las categorías de satisfacción laboral tienen una alta tasa de deserción.

Analisis de Attrition por Marital Status.
¶

In [ ]:
#Visualization to show Total Employees by MaritalStatus.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["MaritalStatus"].value_counts()
plt.title("Employees by MaritalStatus", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['#E84040', '#E96060', '#E88181', '#E7A1A1'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by MaritalStatus.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["MaritalStatus"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index, y=value_2.values,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by MaritalStatus",
          fontweight="black",
          size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(attrition_rate[index])+"% )",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados están casados ​​en la organización.
2. La tasa de deserción es muy alta entre los empleados divorciados.
3. La tasa de deserción es baja para los empleados solteros.

Analisis de Attrition por Monthly Income.
¶

In [ ]:
#Visualization to show Employee Distribution by MonthlyIncome.
plt.figure(figsize=(13,6))
plt.subplot(1,2,1)
sns.histplot(x="MonthlyIncome", hue="Attrition", kde=True ,data=df,palette=["#11264e","#6faea4"])
plt.title("Employee Attrition by Monthly Income",fontweight="black",size=20,pad=15)

#Visualization to show Employee Attrition by Monthly Income.
plt.subplot(1,2,2)
sns.boxplot(x="Attrition",y="MonthlyIncome",data=df,palette=["#D4A1E7","#6faea4"])
plt.title("Employee Attrition by Monthly Income",fontweight="black",size=20,pad=15)
plt.tight_layout()
plt.show()

💬 Conclusion:

1. A la mayoría de los empleados se les paga menos de 10000 en la organización.
2. El ingreso mensual promedio de los empleados que se fueron es comparativamente bajo con los empleados que todavía están trabajando.
3. A medida que aumenta el ingreso mensual, disminuye la deserción.

Analisis de Attrition por Monthly Rate.
¶

In [ ]:
plt.figure(figsize=(13,6))
plt.subplot(1,2,1)
sns.histplot(x="MonthlyRate", hue="Attrition", data=df,kde=True, palette=["#11264e","#6faea4"])
plt.title("Employee Attrition by Monthly Rate",fontweight="black",size=20,pad=15)

plt.subplot(1,2,2)
sns.boxplot(x="Attrition",y="MonthlyRate",data=df,palette=["#D4A1E7","#6faea4"])
plt.title("Employee Attrition by Monthly Rate",fontweight="black",size=20,pad=15)
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La distribución de MonthlyRate es similar en toda la columna.
2. Por lo tanto, esta función no brinda información significativa sobre el desgaste de los empleados.

Analisis de Attrition por Number of Companies Worked.
¶

In [ ]:
df["NumCompaniesWorked"].describe().T
Out[ ]:
count    1470.000000
mean        2.693197
std         2.498009
min         0.000000
25%         1.000000
50%         2.000000
75%         4.000000
max         9.000000
Name: NumCompaniesWorked, dtype: float64
In [ ]:
# Define the bin edges for the groups
bin_edges = [0, 1, 3, 5, 10]

# Define the labels for the groups
bin_labels = ['0-1 Companies', '2-3 companies', '4-5 companies', "5+ companies"]

# Cut the DailyRate column into groups
df["NumCompaniesWorkedGroup"] = pd.cut(df['NumCompaniesWorked'], bins=bin_edges, labels=bin_labels)
In [ ]:
#Visualization to show Total Employees by NumCompaniesWorked.
plt.figure(figsize=(13,6))
plt.subplot(1,2,1)
value_1 = df["NumCompaniesWorkedGroup"].value_counts()
plt.title("Employees by Companies Worked", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['#FF6D8C', '#FF8C94', '#FFAC9B', '#FFCBA4'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by NumCompaniesWorked.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["NumCompaniesWorkedGroup"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(), y=value_2.values,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by Companies Worked",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.xticks(size=12)
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados han trabajado para menos de 2 empresas.
2. Hay una alta tasa de deserción de empleados que han estado en menos de 5 empresas.

Analisis de Attrition por Over Time.
¶

In [ ]:
#Visualization to show Total Employees by OverTime.
plt.figure(figsize=(15,6))
plt.subplot(1,2,1)
value_1 = df["OverTime"].value_counts()
plt.title("Employees by OverTime", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=["#ffb563","#FFC0CB"],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by OverTime.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["OverTime"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(), y=value_2.values,palette=["#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by OverTime",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.xticks(size=13)
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados no trabajan para OverTime.
2. La característica OverTime tiene un desequilibrio de clase muy alto debido a que no podemos obtener información significativa.

Analisis de Attrition por Percentage Salary Hike.
¶

In [ ]:
#Visualization to show Employee Distribution by Percentage Salary Hike.
plt.figure(figsize=(16,6))
sns.countplot(x="PercentSalaryHike", hue="Attrition", data=df, palette=["green","brown"])
plt.title("Employee Attrition By PercentSalaryHike",fontweight="black",size=20,pad=15)
plt.show()

💬 Conclusion:

1. Muy pocos empleados obtienen un alto porcentaje de aumento de salario.
2. A medida que aumenta la cantidad de porcentaje de salario, la tasa de deserción disminuye.

Analisis de Attrition por Performance Rating.
¶

In [ ]:
#Visualization to show Total Employees by PerformanceRating.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["PerformanceRating"].value_counts()
plt.title("Employees by PerformanceRating", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=["#ffb563","#FFC0CB"],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by PerformanceRating.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["PerformanceRating"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(),y= value_2.values,palette=["#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by PerformanceRating",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados tienen una calificación de desempeño excelente.
2. Ambas categorías en este campo tienen la misma tasa de deserción.
3. Es por eso que no podemos generar información significativa

Analisis de Attrition por Relationship Satisfaction.
¶

In [ ]:
#Visualization to show Total Employees by RelationshipSatisfaction.
plt.figure(figsize=(13,6))
plt.subplot(1,2,1)
value_1 = df["RelationshipSatisfaction"].value_counts()
plt.title("Employees by RelationshipSatisfaction", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['#6495ED', '#87CEEB', '#00BFFF', '#1E90FF'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by RelationshipSatisfaction.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["RelationshipSatisfaction"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index, y=value_2.values,order=value_2.index,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by RelationshipSatisfaction",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados tienen una relación alta o muy alta.
2. Aunque la satisfacción de la relación es alta, hay una alta tasa de deserción.
3. Todas las categorías en esta función tienen una alta tasa de deserción.

Analisis de Attrition por Work Life Balance.
¶

In [ ]:
##Visualization to show Total Employees by WorkLifeBalance.
plt.figure(figsize=(14.5,6))
plt.subplot(1,2,1)
value_1 = df["WorkLifeBalance"].value_counts()
plt.title("Employees by WorkLifeBalance", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors= ['#FF8000', '#FF9933', '#FFB366', '#FFCC99'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

#Visualization to show Attrition Rate by WorkLifeBalance.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["WorkLifeBalance"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index, y=value_2.values,order=value_2.index,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Employee Attrition Rate by WorkLifeBalance",fontweight="black",pad=15,size=18)
for index,value in enumerate(value_2.values):
    plt.text(index,value, str(value)+" ("+str(attrition_rate[index])+"%)",ha="center",va="bottom",
             fontweight="black",size=15)
plt.tight_layout()
plt.show()

💬 Conclusion:

1. Más del 60 % de los empleados tienen un mejor equilibrio entre la vida laboral y personal.
2. Los empleados con un mal equilibrio entre la vida laboral y personal tienen una tasa de deserción muy alta.
3. Otras categorías también tienen una alta tasa de deserción.

Analyzing de Attrition por Total Working Years.
¶

In [ ]:
# Define the bin edges for the groups
bin_edges = [0, 5, 10, 20, 50]

# Define the labels for the groups
bin_labels = ['0-5 years', '5-10 years', '10-20 years', "20+ years"]

# Cut the DailyRate column into groups
df["TotalWorkingYearsGroup"] = pd.cut(df['TotalWorkingYears'], bins=bin_edges, labels=bin_labels)
In [ ]:
#Visualization to show Total Employees by TotalWorkingYearsGroup.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["TotalWorkingYearsGroup"].value_counts()
plt.title("Employees by TotalWorkingYears", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['#E84040', '#E96060', '#E88181', '#E7A1A1'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by TotalWorkingYearsGroup.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["TotalWorkingYearsGroup"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(), y=value_2.values,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by TotalWorkingYears",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

  1. La mayoría de los empleados tienen un total de 5 a 10 años de experiencia laboral. Pero su Tasa de Deserción también es muy alta.
  2. Los empleados con experiencia laboral de menos de 10 años tienen una alta tasa de deserción.
  3. Los empleados con experiencia laboral de más de 10 años tienen una tasa de deserción menor.

Analyzing de Attrition por Years at Company.
¶

In [ ]:
# Define the bin edges for the groups
bin_edges = [0, 1, 5, 10, 20]

# Define the labels for the groups
bin_labels = ['0-1 years', '2-5 years', '5-10 years', "10+ years"]

# Cut the DailyRate column into groups
df["YearsAtCompanyGroup"] = pd.cut(df['YearsAtCompany'], bins=bin_edges, labels=bin_labels)
In [ ]:
#Visualization to show Total Employees by YearsAtCompanyGroup.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["YearsAtCompanyGroup"].value_counts()
plt.title("Employees by YearsAtCompany", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['#FFB300', '#FFC300', '#FFD700', '#FFFF00'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by YearsAtCompanyGroup.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["YearsAtCompanyGroup"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(), y=value_2.values,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by YearsAtCompany",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. La mayoría de los empleados ha trabajado de 2 a 10 años en la organización.
2. Muy pocos empleados han trabajado menos de 1 año o más de 10 años.
3. Los empleados que han trabajado durante 2 a 5 años tienen una tasa de deserción muy alta.
4. Los empleados que han trabajado durante más de 10 años tienen una baja tasa de deserción.

Analyzing de Attrition por Years In Current Role.
¶

In [ ]:
# Define the bin edges for the groups
bin_edges = [0, 1, 5, 10, 20]

# Define the labels for the groups
bin_labels = ['0-1 years', '2-5 years', '5-10 years', "10+ years"]

# Cut the DailyRate column into groups
df["YearsInCurrentRoleGroup"] = pd.cut(df['YearsInCurrentRole'], bins=bin_edges, labels=bin_labels)
In [ ]:
#Visualization to show Total Employees by YearsInCurrentRoleGroup.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["YearsInCurrentRoleGroup"].value_counts()
plt.title("Employees by YearsInCurrentRole", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['#6495ED', '#87CEEB', '#00BFFF', '#1E90FF'],textprops={"fontweight":"black","size":15,"color":"black"})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by YearsInCurrentRoleGroup.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["YearsInCurrentRoleGroup"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(), y=value_2.values,palette= ["orange","green","yellow","blue","brown"])
plt.title("Attrition Rate by YearsInCurrentRole",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

  1. La mayoría de los empleados ha trabajado durante 2 a 10 años en el mismo puesto en la organización.
  2. Muy pocos empleados han trabajado menos de 1 año o más de 10 años en el mismo puesto.
  3. Los empleados que han trabajado de 2 a 0 años en el mismo rol tienen una tasa de deserción muy alta.
  4. Los empleados que han trabajado durante más de 10 años en el mismo rol tienen una baja tasa de deserción.

Analisis de Attrition por Years Since Last Promotion.
¶

In [ ]:
# Define the bin edges for the groups
bin_edges = [0, 1, 5, 10, 20]

# Define the labels for the groups
bin_labels = ['0-1 years', '2-5 years', '5-10 years', "10+ years"]

# Cut the DailyRate column into groups
df["YearsSinceLastPromotionGroup"] = pd.cut(df['YearsSinceLastPromotion'], bins=bin_edges, labels=bin_labels)
In [ ]:
#Visualization to show Total Employees by YearsSinceLastPromotionGroup.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["YearsSinceLastPromotionGroup"].value_counts()
plt.title("Employees by YearsSinceLastPromotion", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors=['#FF6D8C', '#FF8C94', '#FFAC9B', '#FFCBA4'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by YearsSinceLastPromotionGroup.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["YearsSinceLastPromotionGroup"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(), y=value_2.values,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])

plt.title("Attrition Rate by YearsSinceLastPromotion",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. Casi el 36% de los empleados no han sido promovidos desde hace 2 a 5 años.
2. Casi el 8 % de los empleados no ha sido ascendido desde hace más de 10 años.
3. Todas las categorías en esta función tienen una alta tasa de deserción, especialmente empleados que no han sido ascendidos desde hace más de 5 años.

Analisis de Attrition por Years with Current Manager.
¶

In [ ]:
# Define the bin edges for the groups
bin_edges = [0, 1, 5, 10, 20]

# Define the labels for the groups
bin_labels = ['0-1 years', '2-5 years', '5-10 years', "10+ years"]

# Cut the DailyRate column into groups
df["YearsWithCurrManagerGroup"] = pd.cut(df['YearsWithCurrManager'], bins=bin_edges, labels=bin_labels)
In [ ]:
#Visualization to show Total Employees by YearsWithCurrManagerGroup.
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
value_1 = df["YearsWithCurrManagerGroup"].value_counts()
plt.title("Employees by YearsWithCurrManager", fontweight="black", size=20, pad=20)
plt.pie(value_1.values, labels=value_1.index, autopct="%.1f%%",pctdistance=0.75,startangle=90,
        colors= ['#FF8000', '#FF9933', '#FFB366', '#FFCC99'],textprops={"fontweight":"black","size":15})
center_circle = plt.Circle((0, 0), 0.4, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)

    
#Visualization to show Attrition Rate by YearsWithCurrManagerGroup.
plt.subplot(1,2,2)
new_df = df[df["Attrition"]=="Yes"]
value_2 = new_df["YearsWithCurrManagerGroup"].value_counts()
attrition_rate = np.floor((value_2/value_1)*100).values
sns.barplot(x=value_2.index.tolist(), y=value_2.values,palette=["#11264e","#6faea4","#FEE08B","#D4A1E7","#E7A1A1"])
plt.title("Attrition Rate by YearsWithCurrManager",fontweight="black",size=20,pad=20)
for index,value in enumerate(value_2):
    plt.text(index,value,str(value)+" ("+str(int(attrition_rate[index]))+"%)",ha="center",va="bottom",
             size=15,fontweight="black")
plt.tight_layout()
plt.show()

💬 Conclusion:

1. Casi el 51 % de los empleados ha trabajado durante 2 a 5 años con el mismo gerente.
2. Casi el 38% de los empleados ha trabajado durante 5 a 10 años con el mismo gerente.
3. Los empleados que han trabajado durante más de 10 años con el mismo gerente tienen una tasa de deserción muy baja.
4. Otras categorías tienen una alta tasa de deserción.

Análisis estadístico: importancia de las características.
¶

💬 Conclusion:

- La prueba Anova se utiliza para analizar el impacto de diferentes características numéricas en una característica categórica de respuesta.
- La prueba Anova devuelve dos valores estadísticos f_score y p_value.
- Nota:
1. Una puntuación F mayor indica una asociación más fuerte entre la(s) variable(s) independiente(s) y la variable dependiente.
2. Si el valor p está por debajo del nivel de significancia elegido (p. ej., p menor que 0,05), entonces podemos rechazar nuestra hipótesis nula.

Esta función utiliza la biblioteca matplotlib y seaborn en Python para crear un gráfico de barras que compara los valores de puntuación F obtenidos en un test de análisis de varianza (ANOVA).

Aquí tienes un desglose de lo que hace cada línea de código:

plt.figure(figsize=(15,6)): Esta línea crea una nueva figura con un tamaño de 15 pulgadas de ancho y 6 pulgadas de alto. Proporciona un lienzo en el que se dibujará el gráfico de barras.

keys = list(f_scores.keys()): Se crea una lista de las claves (o etiquetas) de un diccionario llamado f_scores. Estas claves representarían las categorías o variables que se están comparando en el ANOVA.

values = list(f_scores.values()): Se crea una lista de los valores correspondientes a las claves del diccionario f_scores. Estos valores representarían los puntajes F obtenidos en el ANOVA para cada categoría o variable.

sns.barplot(x=keys, y=values): Se utiliza la función barplot de la biblioteca seaborn para crear el gráfico de barras. Los valores en el eje x son las claves y en el eje y están los valores de puntuación F.

plt.title("Anova-Test F_scores Comparison", fontweight="black", size=20, pad=15): Se establece el título del gráfico con el texto "Anova-Test F_scores Comparison". También se especifican algunos atributos del título, como el grosor de la fuente (fontweight), el tamaño (size) y el espaciado entre el título y el gráfico (pad).

plt.xticks(rotation=90): Se rota el texto de las etiquetas del eje x en 90 grados para que sean legibles si son demasiado largas.

El siguiente bucle for se encarga de añadir el valor de cada barra en el gráfico. Para cada barra, se muestra el valor en la parte superior de la misma.

plt.show(): Esta línea muestra el gráfico de barras completo.

En resumen, esta función toma los valores de puntuación F obtenidos en un ANOVA, los visualiza en un gráfico de barras utilizando seaborn y matplotlib, y añade etiquetas y título para una mejor comprensión de los datos.

In [ ]:
import numpy as np
df = pd.read_csv(r'C:\Users\Admin\Desktop\IBM\Data\WA_Fn-UseC_-HR-Employee-Attrition.csv')
num_cols = df.select_dtypes(np.number).columns
In [ ]:
new_df = df.copy()

new_df["Attrition"] = new_df["Attrition"].replace({"No":0,"Yes":1})
In [ ]:
import pandas as pd
from scipy import stats

f_scores = {}
p_values = {}

for column in num_cols:
    f_score, p_value = stats.f_oneway(new_df[column], new_df["Attrition"])
    
    f_scores[column] = f_score
    p_values[column] = p_value
In [ ]:
f_scores = {}
p_values = {}

for column in num_cols:
    f_score, p_value = stats.f_oneway(new_df[column],new_df["Attrition"])
    
    f_scores[column] = f_score
    p_values[column] = p_value

Visualizing the F_Score of ANOVA Test of Each Numerical features.
¶